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NONPARAMETRIC ESTIMATION OF MULTIVARIATE 
CONVEX-TRANSFORMED DENSITIES 

By Arseni Seregin*'* and Jon A. Wellner^ 

University of Washington^ 

We study estimation of multivariate densities p of the form p(x) = 
h(g(x)) for x G R d and for a fixed function h and an unknown convex 
function g. The canonical example is h(y) — e~ y for y £ R; in this 
case the resulting class of densities 

V(e~ y ) = {p = exp(-g) : g is convex} 

is well-known as the class of log-concave densities. Other functions 
h allow for classes of classes of densities with heavier tails than the 
log-concave class. 

We first investigate when the MLE p exists for the class Vijn) for 
various choices of monotone transformations h including decreasing 
and increasing functions h. The resulting models for increasing trans- 
formations h extend the classes of log-convex densities studied previ- 
ously in the econometrics literature corresponding to h(y) — exp(y). 

We then establish consistency of the MLE for fairly general func- 
tions h, including the log-concave class V(e~ y ) and many others. In 
a final section we provide asymptotic minimax lower bounds for es- 
timation of p and its vector of derivatives at a fixed point xo under 
natural smoothness hypotheses on h and g. The proofs rely heavily 
on results from convex analysis. 

1. Introduction and Background. 

1.1. Log-concave and r -concave densities. A probability density p on M. d 
is called log-concave if it can be written as 

p(x) = exp(-g(x)) 

for some convex function g : M. d — ► (—00,00]. We let V(e~ y ) denote the 
class of all log-concave densities on R . As shown by Ibragimov [1956], a 
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density function p on R is log-concave if and only if its convolution with 
any unimodal density is again unimodal. 

Log-concave densities have proved to be useful in a wide range of sta- 
tistical problems: see Walther [2010] for a survey of recent developments 
and statistical applications of log-concave densities on on R and Mr, and 
see Cule, Samworth and Stewart [2007] for several interesting applications 
of estimators of such densities in R d . 

Because the class of multivariate log-concave densities contains the class of 
multivariate normal densities and is preserved under a number of important 
operations (such as convolution and marginalization) , it serves as a valuable 
nonparametric surrogate or replacement for the class of normal densities. 
Further study of the class of log-concave from this perspective has been 
given by Schuhmacher, Hiisler and Duembgen [2009]. On the analysis side, 
various isoperimetric and Poincare type inequalities have been obtained by 
Bobkov [1999, 2007b], Fougeres [2005], and Milman and Sodin [2008]. 

Log-concave densities have the slight drawback that the tails must be 
decreasing exponentially, so a number of authors, including Koenker and 
Mizera [2008] , have proposed using generalizations of the log-concave family 
involving r— concave densities defined as follows. For a, b G R, r G R, and 
A G (0, 1), define the generalized mean of order r, M r (a, 6; A), by 

f ((1 - X)a r + A6 r ) 1/r , r + 0, a, b > 0, 
M r (a,b;X) = I 0, r < 0, ab = 0, 

[ ar~ x b x , r = 0. 

Then a density function p is r— concave on C C R d if and only 

p({l - X)x + Xy) > M r (p(x),p(y);X) for all x, y G C, AG (0,1). 

We denote the class of all r— concave densities on C C R d by ^(y^C), 

and write V(y l J T ) when C = M d . As noted by Dharmadhikari and Joag- 

Dev [1988], page 86, for r < it suffices to consider V(y+ r ), and it is 

almost immediate from the definitions that p G V(y+ r ) if and only if p(x) = 
{g{x)) 1 ^ r for some convex function g from M. d to [0,oo). For r > 0, p G 

V(y+ r ; C) if and only if p{x) = (g(x)) 1 ^ where g mapping C into (0, oo) is 
concave. 

These results motivate definitions of the classes V(y+ S ) = {p(x) = g(x)~ s : 
g is convex} for s > and, more generally, for a fixed monotone function h 
from R to R, 

V{h) = {p(x) = h(g(x)) : g convex}, 
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Such generalizations of log-concave densities and log-concave measures based 
on means of order r have been introduced by a series of authors, some- 
times with differing terminology, apparently starting with Avriel [1972], and 
continuing with Borell [1975], Brascamp and Lieb [1976], Prekopa [1973], 
Rinott [1976], and Uhrin [1984]. A nice summary of these connections is 
given by Dharmadhikari and Joag-Dev [1988]. These authors also present 
results concerning preservation of r— concavity under a variety of operations 
including products, convolutions, and marginalization. In the mathematics 
literature, the underlying fundamental inequality has come to be known as 
the Borell-Brascamp-Lieb inequality; see e.g. Cordero-Erausquin, McCann 
and Schmuckenschlager [2001]. For these heavy-tailed classes, development 
of isoperimetric and Poincare inequalities is also underway: see e.g. Bobkov 
[2007a] and Bobkov and Ledoux [2009]. 

Despite the long-standing and current rapid development of the properties 
of such classes of densities on the probability side, very little has been done 
from the standpoint of nonparametric estimation, especially when d > 2. 

Nonparametric estimation of a log-concave density on M. d was initiated by 
Cule, Samworth and Stewart [2007]. These authors developed an algorithm 
for computing their estimators and explored several interesting applications. 
Koenker and Mizera [2008] developed a family of penalized criterion func- 
tions related to the Renyi divergence measures, and explored duality in the 
optimization problems. They did not succeed in establishing consistency of 
their estimators, but did investigate Fisher consistency. Recently, Cule and 
Samworth [2009] have established consistency of the (nonparametric) maxi- 
mum likelihood estimator of a log-concave density on M d , even in a setting of 
model miss-specification: when the true density is not log-concave, then the 
estimator converges to the closest log-concave density to the true density in 
the sense of Kullback-Leibler divergence. 

In this paper our goal is to investigate maximum likelihood estimation in 
the classes V[h) corresponding to a fixed monotone (decreasing or increas- 
ing) function h. In particular, for decreasing functions h, we handle all the 

r— concave classes V(y+ r ) with r = —1/s and r < — 1/d (or s > d). On 
the increasing side, we treat, in particular, the cases h(y) = yl[o,oo)(y) an d 
h(y) = e y with C = R^. The first of these corresponds to an interesting 
class of models which can be thought of as multivariate generalizations of 
the class of decreasing and convex densities on M + treated by Groeneboom, 
Jongbloed and Wellner [2001], while the second, h(y) = e y corresponds to 
multivariate versions of the log-convex families studied by An [1998]. Note 

1 It i 

that our increasing classes V(y+ , M+) with r > are quite different from 
the r— concave classes defined above and appear to be completely new, cor- 
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responding instead to r— convex densities on 

Here is an outline of the rest of the paper. All of our main results are 
presented in Section 2. Subsection 2.1 gives definitions and basic properties 
of the transformations involved. Subsection 2.2 establishes existence of the 
maximum likelihood estimators for both increasing and decreasing trans- 
formations h under suitable conditions on the function h. In subsection 2.3 
we give statements concerning consistency of the estimators, both in the 
Hellinger metric and in uniform metrics under natural conditions. In sub- 
section 2.4 we present asymptotic minimax lower bounds for estimation in 
these classes under natural curvature hypotheses. We conclude this section 
with a brief discussion of conjectures concerning attainability of the mini- 
max rates by the maximum likelihood estimators. All the proofs are given 
in Section 3. We summarize a number of key results from convex analysis in 
an appendix, Section 4. 

1.2. Convex-transformed density estimation. Now let X\, . . . , X n be n 
independent random variables distributed according to a probability density 
Po = h(go(x)) on M d here h is a fixed monotone (increasing or decreasing) 
function and <?o is an (unknown) convex function. The probability measure 
on the Borel sets Bd corresponding to po is denoted by Po- 

The maximum likelihood estimator (MLE) of a log-concave density on 
R was introduced in Rufibach [2006] and Dumbgen and Rufibach [2007]. 
Algorithmic aspects were treated in Rufibach [2007] and in a more general 
framework in Dumbgen, Hiisler and Rufibach [2007], while consistency with 
respect to the Hellinger metric was established by Pal, Woodroofe and Meyer 
[2007] , and rates of convergence of f n and F n were established by Dumbgen 
and Rufibach [2007] . Asymptotic distribution theory for the MLE of a log- 
concave density on M. was established by Balabdaoui, Rufibach and Wellner 
[2009]. 

If C denotes the class of all closed proper convex functions g : K — > 
(— oo, do], the estimator g n of go is the maximizer of the functional: 



over the class Q{h) C C of all convex functions g such that h o g is a density 
and where P n is the empirical measure of the observations. The maximum 
likelihood estimator of the convex-transformed density po is then p n := h(g n ) 
when it exists and is unique. We investigate conditions for existence and 
uniqueness in Section 2. 

2. Main results. 
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2.1. Definitions and basic properties. To construct the classes of convex- 
transformed densities of interest here, we first need to define two classes of 
monotone transformations. An increasing transformation h is a nondecreas- 
ing function R — > IR + such that h(— oo) = and /i(+oo) = +oo. We define 
the limit points yo < yoo of the increasing transformation h by: 

Vo = inf {y : /i(y) > 0} 
yoo = sup{y : /i(y) < +oo}. 

We make the following assumptions about the asymptotic behavior of the 
increasing transformation. 

(1.1) The function h(y) is o(\y\~ a ) for some a > d as y — > — oo. 

(1.2) If < +oo then h(y) x (yoo - y) _/3 for some f3 > d as y | yoo- 

(1.3) The function /i is continuously differentiable on the interval (yojZ/oo)- 
Note that the assumption 1.1 is satisfied if yo > — oo. 

Definition 2.1. -For an increasing transformation h an increasing class 
of convex-transformed densities or simply an increasing model V{K) on M + 
is the family of all bounded densities which have the form h o g = h{g{-)), 
where g is a closed proper convex function with dom g = K , . 

Remark 2.2. Consider a density hog from an increasing model V{h). 
Since h o g is bounded we have g < yoo. The function g = max(g,yo) is 
convex and h o g = h o g. Thus we can assume that g > yo- 

A decreasing transformation h is a nonincreasing function R — ► M + such 
that /i(— oo) = +oo and /i(+oo) = 0. We define the limit points yo > yoo of 
the decreasing transformation h by: 

yo = sup{y : h(y) > 0} 
yoo = M{y:h(y) < +oo}. 

We make the following assumptions about the asymptotic behavior of the 
decreasing transformation. 

(D.l) The function h{y) is o{y~ a ) for some a > d as y — > +oo. 

(D.2) If yoo > — oo then /i(y) x (y — yoo) _/3 for some /3 > o! as y j yoo- 

(D.3) If yoo = -co then h(y)' y h(—Cy) = o(l) for some 7, C > as 
y -» -00. 
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(D.4) The function h is continuously differentiable on the interval (j/oo, yo)- 

Note that the assumption D.l is satisfied if yo < +00. Now we define the 
decreasing class of densities V{h). 

Definition 2.3. For a decreasing transformation h a decreasing class 
of convex-transformed densities or simply a decreasing model V(h) on is 
the family of all bounded densities which have the form hog, where g is a 
closed proper convex function with dim(dom g) = d. 

Remark 2.4. Consider a density hog from a decreasing model V{K). 
Since ho g is bounded we have g > yoo . For the sublevel set C = lev^ Q g, the 
function g = g + 8{- \ C) is convex and h o g = h o g. Thus we can assume 

that lev,, h = dom q. 

yo a 

For a monotone transformation h we denote by Q{h) the class of all convex 
functions g such that ho g belongs to a decreasing class V{K). The following 
lemma allows us to compare models defined by increasing or decreasing 
transformations h 

Lemma 2.5. Consider two decreasing (or increasing) models V(h\) and 
V{li2). If hi = /12 o / for some convex function f then C "P(/i2)- 

Proof. The argument below is for a decreasing model. For an increasing 
model the proof is similar. If f{x) > f{y) for some x < y then / is decreasing 
on (—00, x), /(— 00) = +00, and therefore h<i is constant on (/(x), +00) and 
we can redefine f(y) = f{x) for all y < x. Thus we can always assume that 
/ is nondecreasing. 

For any convex function g, the function / o g is also convex. Therefore, if 
V = hi o g £ V(hi), then p = /i2°/°5 f G Vih?)- □ 

In this section we discuss several examples of monotone models. First two 
families based on increasing transformations h: 

Example 2.6 (Log-convex densities ). This increasing model is defined 
by h(y) = e y and C = Limit points are yo = —00 and y^ = 00. As- 
sumption LI holds for any a > d. These classes of densities were considered 
by An [1998], who established several useful preservation properties. In par- 
ticular, log-convexity is preserved under mixtures (An [1998, Proposition 3]) 
and under marginalization (An [1998, Remark 8, p. 361]). 
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Example 2.7 (r— Convex densities). This family of increasing models is 
defined by the transforms h(y) = max(y,0) s = y+ with s > and C = WL. 
Limit points are: yo = and y^ = +oo. Assumption LI holds for any 
a > d. As noted in Section 1 the model V (y+ corresponds to the class 

of r— convex densities, with r = oo corresponding to the log-convex densities 
of the previous example. For r < oo these classes seem to have not been 
previously discussed or considered, except in special cases: the case r = 1 
and d = 1 corresponds to the class of decreasing convex densities on M+ 
considered by Groeneboom, Jongbloed and Wellner [2001]. It follows from 
Lemma 2.5 that 

(2.1) V{e y ,R%) cV{y s ^,M. d + ) CP^.Rj), for < s 2 < s x < oo. 

Now for some models based on decreasing transformations h: 

Example 2.8 (Log-concave densities). This decreasing model is defined 
by the transform h{y) = e~ y . Limit points are: yo = +°° an d Hoc = —oo. 
Assumption D.l holds for any a > d. Assumption D.3 holds for any 7 > 
C > 0. 

Many parametric models are subsets of this model. Below we specify 
convex functions g which correspond to the densities of several distributions: 

1. Uniform: Density of a uniform distribution on a convex set C is log- 
concave: 

g(x) = -log(v[C]) + 6(x\C). 

2. Normal: Density of a multivariate normal distribution (fi, X) with X 
nonsingular is log- concave: 

g{x) = \{x- n) T Yr\x - n) + i log |S| + - log 2vr. 

3. Gamma: Density of Gamma distribution (r, A) is log-concave for r > 1: 

g{x) = — (r — 1) logx + Xx — r log A + logT(r). 

4. Beta: Density of Beta distribution (a, (3) is log-concave for a, (3 > 1: 

g (x) = -{a - 1) logic - (P - l)log(l - x) + log £(«,/?). 
Gumbel, Frechet and logistic distributions also have log-concave densities. 
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Example 2.9 (r— Concave densities; Power-convex densities) . This fam- 
ily of decreasing models is defined by the transforms h{y) = y^_ s for s > d. 
Limit points are: yo = +00 and y^ = 0. Assumption D.l holds for any 
a G (d,s\. Assumption D.2 holds for (3 = s. As noted in Section 1 the 
model V(y+ r ) = V(y+ S ) (with r = — 1/s < 0) corresponds to the class of 
r— concave densities. From Lemma 2.5 we have the following inclusion: 

(2.2) PHCV)CP(^), for s 1<S2 . 

The models defined by power transformations include some parametric 
models with heavier than exponential tails. The following examples are dis- 
cussed in Borell [1975]. We use Johnson and Kotz Johnson and Kotz [1972] 
as a reference. 

1. Pareto (Johnson and Kotz [1972] 42.3): Density of a multivariate 
Pareto distribution (9, a) is power-convex for s G (d, a + d] : 

2. Student (Johnson and Kotz [1972] 37.3): Density of a multivariate 
t-distribution (d, n, fj,, S) is power-convex for s G (d, n + d\. 

9{x) = ' 1 ' 



r((d + n)/2) 



■\ /s / 1 \< 
J ^l + n (x-^) T S^(x-/i)j 




3. Snedecor (Johnson and Kotz [1972] 40-8): Density of a multivariate 
F-distribution (no, n\, . . . , rid) with n{ > 2 is power-convex for s G 
(d, (n /2)+d}: 



9{x) 



where n = J2t=o n i ■ 

Since the distributions above belong to the power-convex models only for 
bounded values of the parameter s the inclusion (2.2) implies that they do 
not belong to the log-concave model (corresponding to s = +00) 

Borell [1975] developed a framework which unifies log-concave and power- 
convex densities and gives interesting characterization for these classes. Here 
we briefly state the main result. 
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Definition 2.10. Let C C R d be an open convex set and let s G R. 
Then we define Ai s (C) as the family of all positive Radon measures /i on C 
such that: 

(2.3) fr(9A + (1 - d)B) > [0^(A) S + (1 - 0)/i*(B) s ] 1/s 

holds for all / A,B C C and a// G (0,1). We de/irae M° S {C) as a 
subfamily of M. S {C) which consists of probability measures such that the 
affine hull of its support has dimension d. Here /x* is the inner measure 
corresponding to fx and the cases s = 0, oo are defined by continuity. 

Then one of the main results of Borefl [1975], Prekopa [1973], and Rinott 
[1976] is as follows: 

Theorem 2.11 (Borell, Prekopa, Rinott). For s < the family M°(R d ) 
coincides with the power-convex family V(y + d+l ^ s ) ■ For s = the family 
.Mo(M rf ) coincides with the log- concave family V(e~ y ). 

Finally, Corollary 3.1 Borell [1975] says that the condition (2.3) can be 
relaxed: 

Theorem 2.12 (Borell). Let O C R d be an open convex set and let be 
a positive Radon measure in fL Then \x G A4 S (Q) if and only if 

'•G- 4i+ H a G"<' 4 ' , ' + Wf 

holds for all compact (or open, or semiopen) blocks A\,A?, C Q (i.e. rectan- 
gles with sides parallel to the coordinate axes). 

Theorem 2.11 gives a special case of what has come to be known as 
the Borell-Brascamp-Lieb inequality; see e.g. Dharmadhikari and Joag-Dev 
[1988], and Brascamp and Lieb [1976]. The current terminology is apparently 
due to Cordero-Erausquin, McCann and Schmuckenschlager [2001]. 

2.2. Existence of the Maximum Likelihood Estimators. Now suppose that 
Xi, . . . , X n are i.i.d. with density po(x) = h(go(x)) for a fixed monotone 
transformation h and a convex function go- As before, F n = n _1 Ya=i &Xi 
is the empirical measure of the Xi 's and Po is the probability measure cor- 
responding to pq. Then L n g = f n \ogh o g is the log- likelihood function 
(divided by n), and 

p n = argmax{L„5 : h o g G V{h)} 
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is the maximum likelihood estimator of p over the class V(h), assuming it 
exists and is unique. We also write g n for the MLE of g. We first state 
our main results concerning existence and uniqueness of the MLE's for the 
classes V(h): 

Theorem 2.13. Suppose that h is an increasing transformation satis- 
fying assumptions LI - 1.3. Then the MLE p n exists almost surely for the 
model V(h). 

Theorem 2.14. Suppose that h is a decreasing transformation satisfying 
assumptions D.l - D.4- Then the MLE p n exists almost surely for the model 



Uniqueness of the MLE is known for the log-concave model V{e~ y ), see 
e.g. Duembgen and Rufibach [2009] for d = 1 and Cule et al. [2007] for d > 1. 
For a brief further comment see section 2.5. 

2.3. Consistency of the Maximum Likelihood Estimators. Once existence 
of the MLE's is assured, our attention shifts to other properties of the es- 
timators: our main concern in this sub-section is consistency. While for a 
decreasing model it is possible to prove consistency without any restric- 
tions, for an increasing model we need the following assumptions about the 
true density ho go: 

(1.4) The function go is bounded by some constant C < Hoq. 

(1.5) If d > 1 then we have, with \x\ = 11?= l x j f° r x £ K +> 



Remark 2.15. Note that for d = 1 the assumption 1.5 follows from the 
assumption 1. 4 and integrability o/log(l/x) at zero. This assumption is also 
true if P has finite marginal densities. 



V(h) if 



n>n d 




if Voo = -oo 
if Voo > -oo. 




(1.6) We have: 
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Let H{p, q) denote the Hellinger distance between two probability mea- 
sures with densities p and q with respect to Lebesgue measure on 

(2.4) H\p, q) = l [ (y/p(x) - ^{x)) 2 dx = 1 - / Jp(x)q(x)dx. 

Our main results about increasing models are as follows. 

Theorem 2.16. For an increasing modelV^h) where h satisfies assump- 
tions 1.1 - 1.3 and for the true density hog which satisfies assumptions 1. 4 
- 16, the sequence of MIEs {p n = h o g n } is Hellinger consistent: 

H(p n ,Po) = H(ho g n ,ho g ) -> a s 0. 

Theorem 2.17. For an increasing modelV{h) where h satisfies assump- 
tions 1.1 - 1.3 and for the true density h o g which satisfies assump- 
tions 1-4 - 16, the sequence of MIEs g n is pointwise consistent. That is 
9n(x) — >a.s. 9o(x) for x G ri(M^_) and convergence is uniform on compacta. 

The results about decreasing models can be formulated in a similar way. 

Theorem 2.18. For a decreasing model V{h) where h satisfies assump- 
tions D.l - D.4, the sequence of MIEs {p n = hog n } is Hellinger consistent: 

H(p n ,Po) = H{h o g n , h o g Q ) -> s 0. 

Theorem 2.19. For a decreasing model V(h) with h satisfying assump- 
tions D.l - D.4, the sequence of MIEs g n is pointwise consistent in the 
following sense. Define g$ = go + 5(-\ ri(domgo))- Then g^ = go a.e., 

9n > a.s. 9qi 

and the convergence is uniform on compacta. Moreover, if dom go = M. d 
then: 

\\ho g n - fro0o||oc ~^a.s. 0. 

2.4. Local Asymptotic Minimax LowerBounds. In this section establish 
local asymptotic minimax lowers for any estimator of several functionals of 
interest on the family V(h) of convex transformed densities. We start with 
several general results following Jongbloed [2000], and then apply them to 
estimation at a point fixed point and to mode estimation. 

First, we define minimax risk as in Donoho and Liu [1991]: 
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Definition 2.20. Let V be a class of densities on M. d with respect to 
Lebesgue measure and T be a functional T : V —* M. For an increasing 
convex loss function I on M + we define the minimax risk as: 

(2.5) Ri{n;T,V)=mfsupE pXn l{\t n {X 1 ,...,X n )-Tp\), 

t„ pg -p 

where t n ranges over all possible estimators of Tp based on Xi, . . . ,X n . 

The main result (Theorem 1) in Jongbloed [2000] can be formulated as: 

Theorem 2.21 (Jongbloed). Let {p n } be a sequence of densities in V 
such that: 

lim sup y/nH(p n ,p) < r 

n^oo 

for some density pinV. Then: 

/o r\ v ■ r Rl(n;T,{p,p n }) 

(2 ' 6) wi^m^mi) - 

It will be convenient to reformulate this result in the following form: 

Corollary 2.22. Suppose that for any e > small enough, there exists 
p £ £ V such that for some r > 0: 

lim e~ 1 \Tp £ — Tp\ = 1 

and 

lim sup e~ r H(p e ,p) < c. 
Then, there exists a sequence {p n } such that: 
(2.7) hminfn^^^r,^,^}) > ^^c'^, 

where R\ is the risk which corresponds to l{x) = \x\. 

Corollary 2.22 shows that for a fixed change in the value of the functional 
T, a family p £ which is closer to the true density p with respect to Hellinger 
distance provides sharper lower bound. This suggests that for the functional 
T which depends only on the local structure of the density we would like our 
family {p £ } to deviate from p also locally. Below, we define formally such 
local deviations. 
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Definition 2.23. We call a family of measurable functions {p £ } a de- 
formation of a measurable function p if p £ is defined for any e > small 
enough, 

lim ess sup \p — p £ \ =0, 

and there exists a bounded family of real numbers r £ and a point xq such 
that: 

yu[supp|p e (x) -p{x)\) > 0, 
supp|p e (x) -p(x)\ C B(x ,r £ ). 

If in addition we have: 

lim r> = 

we say that {p £ } is a local deformation at xq. 

Since for a deformation p £ we have /x[supp \p £ (x) — p(x)\] > 0, for every 
e > 0, there exists 5 > such that n{x : \p £ (x) — p(x)\ > 5} > and thus 
the L r -distance from p £ to p is positive for all e > 0. Note that this is always 
true if p and p £ are continuous at xq and p £ (xo) ^ p(xq). 

Now we can state our lower bound for estimation of the convex-transformed 
density value at a fixed point xq. This result relies on the properties of 
strongly convex functions as described in Appendix 4.4 and can be applied 
to both increasing and decreasing classes of convex-transformed densities. 

Theorem 2.24. Let h be a monotone transformation, let p = h o g £ 
V{h) be a convex-transformed density, and suppose that xq is a point in 
ri(dom<7) such that h is continuously differentiable at g(xo), ho g(xo) > 0, 
h' o g(xo) 7^ 0, and curv Xo g > 0. Then, for the functional T(h o g) = g(xo) 
there exists a sequence {p n } CV(h) such that: 

(2.8) liminfn^4^i(n;T,{/ioc,,p n }) > C(d) 

n — >oo 

where the constant C(d) depends only on the dimension d. 

Remark 2.25. If in addition g is twice continuously differentiable at xq 
and V 2 g(xo) is positive definite, then by Lemma we have curv x . Q g = 
det(V 2 «?(^o))- 
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In Jongbloed [2000] lower bounds were constructed for functionals with 
values in M. However, it is easy to see that the proof does not change for 
functionals with values in an arbitrary metric space (V, s) if instead of \Tp — 
Tp n \ we consider s(Tp,Tp n ). We define: 

(2.9) R s (n;T,V) =infsupE xn s(t n (X u . . . ,X n ),Tp), 

t„ pg -p 

and the analogue of Corollary 2.22 has the following form: 

Corollary 2.26. Suppose that for any e > small enough, there exists 
p £ £ V such that for some r > 0: 

lim e~ 1 s(Tp £ , Tp) = 1 

e^O 

\\m.s\xp e~ T H(p £ ,p) < c. 
Then, there exists a sequence {p n } such that: 

(2.10) liminfn^^^r,^,^}) > V/^ '^- 

In this section we consider estimation of the functional 

T(h og)= argmin^) G R d 

for the density p = hog £ V{K) assuming that the minimum is unique. This 
is equivalent to estimation of the mode of p = h o g. 

Construction of a lower bound for the functional T is similar to the pro- 
cedure we presented for estimation of p = h o g at a fixed point xq. Again, 
we use two opposite deformations: one is local and changes the functional 
value, another is a convex combination with a fixed deformation and negli- 
gible in Hellinger distance computation. However, in this case the minimax 
rate depends also on the growth rate of g. 

Theorem 2.27. Let h be a decreasing transformation, h o g £ V{h) 
be a convex-transformed density and a point xo £ ri(domg) be a unique 
global minimum of g such that h is continuously differentiable at g{x$), 
h! o g{x§) ^ and curv XQ g > 0. In addition let us assume that g is locally 
Holder continuous at xq: 

\g(x) - g{x )\ < L\\x - x || 7 
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with respect to some norm \\ ■ \\ . Then, for the functional T{hog) = argmin g 
there exists a sequence {p n } £ 'P(h) such that: 



where the constant C{d) depends only on the dimension d and metric s(x,y) 
is defined as \\x — y\\ . 

Remark 2.28. // in addition g is twice continuously differ entiable at xq 
and V 2 g(xo) is positive definite, then by Lemma we have curv Xu h = 
det(V 2 g(xo)) and g is locally Holder continuous at xq with exponent 7 = 2 
and any constant L > ||V 2 g(xo)||. 

Remark 2.29. Since curv^, g > there exists constant C such that: 



and thus we have 7 £ (0, 2]. 

2.5. Conjectures concerning uniqueness of MLEs. There exist counterex- 
amples to uniqueness for nonconvex transformations h which satisfy assump- 
tions D.l - D.4. They suggest that uniqueness of the MLE does not depend 
on the tail behavior of the transformation h but rather on the local prop- 
erties of h in neighborhoods of the optimal values g n {Xi). We conjecture 
that uniqueness holds for all monotone models if h is convex and h/\h'\ is 
nondecreasing convex. Further work on the uniqueness issues is needed. 

2.6. Conjectures about rates of convergence for the MLEs. We conjecture 
that the (optimal) rate of convergence n 2 ^ d+ ^ appearing in Theorem 2.24 
for estimation of /(xo) wm be achieved by the MLE only for d = 2, 3. For 
d = 4, we conjecture that the MLE will come within a factor (logn)~ 7 (for 
some 7 > 0) of achieving the rate n 1 / 4 , but for d > 4 we conjecture that the 
rate of convergence will be the suboptimal rate n l / d . This conjectured rate 
sub-optimality raises several interesting further issues: 

• Can we find alternative estimators (perhaps via penalization or sieve 
methods) which achieve the optimal rates of convergence? 

• For interesting sub-classes do maximum likelihood estimators remain 
rate optimal? 



(2.11) 



limmfn^ d + 4 )i? s (n;T,{p,p n }) > C(d)L 7 



1 o g(xo) 2 curv x g] l(d+4) 



h> o 5 (x ) 4 
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Baraud and Birge [2009] have recently studied estimation in generalizations 
of our classes V(h) in which both h and g are unknown with h : W 71 — * R 
and g : M d — > M m . While they achieve optimal rates of estimation in their 
more general problem with a class of estimators based on multiple testing 
and model selection, their procedures will very likely have several drawbacks 
relative to the methods we have studied here, including (a) not belonging 
to the the classes of densities defined by the models, and (b) difficulties in 
computation or implementation. 

3. Proofs. 

3.1. Preliminaries: properties of increasing transformations. 

Lemma 3.1. Let h be a increasing transformation and g be a closed 
proper convex function with dom g = IR^_ such that 

_ d h o gdx = C < oo. 

Then the following are true: 

1. For a sublevel set lev y g with y > yo we have: 

fi[(lev y g) c }<C/h(y). 

2. For any point xq € Mi and any subgradient a 6 dg(xo) all coordinates 
of a are nonpositive. If in addition g{xo) > yo then all coordinates of 
a are negative. 

3. For any point xq G such that g(xo) > yo we have: 

h r w Cd - 

h ° 9{xo) ^d^y 

where \x\ = njfc=i x k f or % S M+. 
4- The function h reverses partial order on M^: if xi < X2 then g(x\) > 

g{x2) and the last inequality is strict if g{x%) > yo- 
5. The supremum of g on R + is attained at 0. 

Proof. 1. Since h is nondecreasing we have h{y) > and: 

C = _ d hogdx> / ho gdx > h(y)/j,[(lev g) c \. 
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2. Consider the linear function l(x) = a T (x— xo)+g(xo)- We have g > I. If 
the vector a has a nonnegative coordinate ai then consider a closed ball 
B = B(xq) C If m is a minimum of the function I on i? then the 
minimum of the function hoi on B + Ae^ is equal to h{m + Aoj) , where 
ej is the element of the basis which corresponds to the ith coordinate. 
For A > we have B + Ae« C . 

If a, > then: 

/_ hogdx> I holdx> / holdx > fx[B]h(m + Aaj) — > +oo 
as A — > oo, which contradicts the assumption. 

If Oj = and g(xo) = K x o) > yo 5 then we can choose the radius of the 
ball small enough so that m > i/q. Then: 

/_ h o gdx > _ holdx> / h o Idx > ^[K]h(m) = +oo 
Jm p + ' Jm p + Jk 

where K = U\ > o(B + Aej), and this again contradicts the assumption. 

3. Consider the subgradient a £ dg(xo). For the linear function l{x) = 
a T (x— xo)+g(xo) we have g > I and l(xo) = g(xo) therefore (lev^,^ l) c C 
^ ev g(x ) d) C - From the previous statement we have that lev^ \ I is a 
simplex and using the Cauchy-Schwartz inequality we have: 



(a T xo) d d d \xo 



> 



dl\a\ ~ dl 



which together with 1. proves the statement. 
4. Since x\ £ M + and x\ < X2 we have X2 £ = ri(domg). For any 
subgradient a £ dg{x2) we have 



g(xx) - g(x 2 ) > a T {xi - x 2 ) > 

from the previous statement. Now, if g{x\) > yo then we can assume 
that g{x 2 ) > yo since otherwise the statement is trivial. In this case 
all coordinates of a are negative and: 

g(xi) - g(x 2 ) > a T (xi - x 2 ) > 0. 

5. From the previous statement we have that hog < /iog(0) on which 
together with continuity of ho g implies the statement. 

□ 
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Lemma 3.2. Let h be an increasing transformation, g be a closed proper 
convex function on M + and Q be a a -finite Borel measure on M + . Then: 

[ hogdQ= [ ti(y)Q[(lev y g) c nlev a g)dy. 

Proof. Using the Fubini-Tonelli theorem we have: 
r r rK°) 

I hogdQ= / / l{z < h o g(x)}dzdQ(x) 

Jlev_ a J\ev„ a JQ 



i\ev a g Jlev a gJ0 

I I 

l\ev a g JO 



h(a) 

l{h~ 1 {z) < g{x)}dzdQ{x) 



ti(y)l{y < g{x)}dydQ(x) 

lev q J — oo 

ti(y) I l{y < g(x)}dQ(x)dy 
ti(y)Q[(lev v g) c nlev a g]dy. 



□ 

Lemma 3.3. Let h be an increasing transformation and let g be a poly- 
hedral convex function with dom g = M , such that: 



I d h o gdx < oo. 

J IR 



Then g(0) < y a 



Proof. For y^ = +oo the statement is trivial so we assume that y^ is 
finite. If <?(0) > yoo then since g is continuous there exists a ball B C R, 
small enough such that g > y^ on B and therefore 



j d h o gdx = oo. 



Let us assume that g(0) = y^. By Lemma 4.13 there exists a G dg(0) and 
therefore g(x) > l(x) = a T x + y^. Let a m be the minimum among the 
coordinates of the vector a and —1. Then on we have l{x) > l\{x) = 
a m l T x + yoo where a m < and thus h(x) < y^. By Lemma 3.2 we have: 

[ d hogdx> fho l x dx = [ V °° /i / ( 2 /) / u[(lev h) c D W%]dy. 
Jr + Jm + J-oo y 
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The set A y = (lev y g) c H K + is a simplex and: 

r 4 i _ (y°° - y) d 
^ ~ d\{-a m y 

for y < yoo. By assumption 1.2 we have fo'(y) x (yoo - y)" 13 ' 1 as y f 2/oo 
where (3 > d and therefore: 



/ /i o 3i <ix = .ho gdx = +oo. 



This contradiction proves that g(0) < yoo- D 

Lemma 3.4. Let h be an increasing transformation and let l(x) = a T x+b 
be a linear function such that all coordinates of a are negative and b < y^. 
Then: 

h o Idx < oo. 



PROOF. We have I < b on M^. and by Lemma 3.2: 

rb , 

holdx= / h'(y)fi[(lev y l) c nR + ]dy. 

J — oo 

The set A y = (lev^ l) c Pi is a simplex and: 

M ^ J ~ d!|-o| 

for y < b. By assumption 1.1 we have h'(y) = o(y~ a ~ 1 ) as y — ► — oo for 
a > <i and therefore the integral is finite. □ 

Lemma 3.5. Let h be an increasing transformation and suppose that 
K C M + is a compact set. Then there exists a closed proper convex function 
g £ G{h) such that g > yo on K. 

Proof. If yo = — 00 then consider the function T{c) defined as: 
Tic) = (ho (-l T x + c)dx. 

By Lemma 3.4, T(c) is finite for c < yoo, and by Lemma 3.3, we con- 
clude that T{y 00 ) = +oo. By monotone convergence T is left-continuous 
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for c G (— oo,yoo] and by dominated convergence is right-continuous for 
c G (— 00,1/00). Since T(— 00) = there exists c\ < yoo such that T(c\) = 1 
and thus the linear function l(x) = —l T x + c± belongs to Q{h). 

If yo < — 00 then choose M such that l T x < M on if. Consider the 
function T(c) defined as: 

T(c) = [ . h o (c(-l r x + M) + y )da;. 

By Lemma 3.4, T(c) is finite for c < (yoo—yo) /M and by Lemma 3.3, T((yoo — 
yo)/M) = +00. By monotone and dominated convergence T is continuous 
for c G [0, (yoo - yo)/M]. Since T(0) = there exists cj G (0, ( yoo - ya)/M) 
such that linear function l(x) = c\{— l T x + M) + yo belongs to Q(h). By 
construction I > yo on K. □ 

3.2. Preliminaries: properties of decreasing transformations. 

Lemma 3.6. Let h be a decreasing transformation and g be a closed 
proper convex function such that 



/ ho gdx = C < 00. 



1. For y < +00 the sublevel sets lev^y are bounded and we have: 



Then the following are true: 

'jet 

fi[lev y g]<C/h(y); 
2. The infimum of g is attained at some point x G 
Proof. 1. We have: 

C = Ld hogdx> hogdx> h(y)fi[lev g] 

JR + Jlev y g y 

ti[ley y g]<C/h(y). 

The sublevel set lev y g( has the same dimension as doing (Theorem 
7.6 Rockafellar [1970]) which is d. By Lemma 4.1 this set is bounded 
when y < yo. Therefore, it is enough to prove that lev g is bounded 
for yo < +00. 

Since h o g is a density we have inf g < yo- If g is constant on domy, 
then for all y G [inf g, +00) we have lev^y = lev inf9 /i and is there- 
fore bounded. Otherwise, we can choose inf h < y\ < yi < yo- Then 
yu[lev g] < 00, and by Lemma 4.3 we have /u[lev 5] < 00. The argu- 
ment above shows that lev,, q is also bounded. 
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2. Follows from the fact that g is continuous and lev^g is bounded and 
nonempty for y > inf g. 

□ 

Lemma 3.7. Let h be a decreasing transformation, let g be a closed proper 
convex function on M d , and let Q be a a-finite Borel measure on M. d . Then: 

r r+oo 

/ hogdQ = - h'{y)Q[lev y gn(lev a g) c }dy. 

Proof. Using the Fubini-Tonelli theorem we have: 

r rK a ) 
hogdQ= / / l{z < h o g(x)}dzdQ(x) 

(lev a9 ) c ' " J{lev a gy Jo 

h{a) 

l{h~ l {z) > g{x)}dzdQ{x) 

(iev a g yJo 

r+oo 

/ h'{y)l{y>g{x)}dydQ{x) 

(lev a g) c J a 

h'(y) / l{y > g(x)}dQ(x)dy 
h'(y)Q[lev y gn(lev a gr]dy. 

a 

□ 

Lemma 3.8. Let h be an decreasing transformation and let g be a closed 
proper convex function such that: 



h o gdx < oo. 



Then inf g > y^ . 



Proof. Since g is proper the statement is trivial for = — oo, so we 
assume that y^ > — oo. If for xq we have g(xo) = yoo, then there exists a 
ball B = B{x; r) such that g < t/oo + e on B. Consider the convex function 
/ defined as: 

f(x) = yoo + (s/r)\\x - x \\ + 5(x | B). 
Then by convexity / > g and: 

/ ho gdx > ho fdx. 
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We have //[lev /] = S(y — yoo) d for y G [yoo , y^ + e] where 5 is the Lebesgue 
measure of a unit ball B(0; 1) and by Lemma 3.7 we can compute: 

/ hofdx = -S f VOO+£ h'{y){y - yoo ) d dy. 

jRd J yoo 

The assumption D.2 implies: 

/ h o gdx > I ho fdx = cc, 

which proves the statement. □ 

Lemma 3.9. Let h be a decreasing transformation. Then for any convex 
function g such that hog belongs to the decreasing model V{h) we have: 



I [h\og h] o gdx < do. 



Proof. By assumption D.l the function — [h log h] (y) is decreasing to 
zero as y — > +oo and we have: 

< -[hlogh](y) < Cy- d - a ' 

for C large enough and a' £ (0, a) as y — > +cxd. 

By Lemma 3.6 the level sets lev y g are bounded and since hog G 7- > (/i) we 
have inf 5 > y^. Therefore, the integral is finite if and only if the integral: 

/ [h log h] o gdx 

is finite for some a > yoo. Choosing a large enough and using Lemma 3.7 
for the decreasing transformation h\{y) = y~ d ~ a we obtain: 



r r r+00 

0> / [h log h) o gdx > —C / hi o gdx > C / ^(y)/i[lev v g]dy 

r+00 

-C{d + a') / y- d - a - l ^ y g}dy. 

J a 



By Lemma 4.3 we have /i(lev g) = 0(y d ) and therefore the last integral is 
finite. □ 

Lemma 3.10. Let h be a decreasing transformation and suppose that 
K C M. d is a compact set. Then there exists a closed proper convex function 
g G G(h) such that g < yo on K. 

Proof. Let B be a ball such that K C B. Let c be such that h(c) = 
l/(i[B]. Then the function g = c + 6(- \ B) belongs to Q{h). □ 
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3.3. Proofs for Existence Results. Before giving proofs of Theorems 2.13 
and 2.14, we establish two auxiliary lemmas. 

A set of points x = {xi}f =1 in W 1 is in general position if for any subset 
x' C x of the size d + 1 the Lebesgue measure of conv(x') is not zero. 

Lemma 3.11. If X±, . . . ,X n are i.i.d. po = h o g g V(h) for a mono- 
tone transformation h, then the observations X are in general position with 
probability 1. 

Proof. Points are not in general position if at least one subset Y of X 
of size d + 1 belongs to a proper linear subspace of M d . This is true if and 
only if X as a vector in M. nd belongs to a certain non-degenerate algebraic 
variety. Since with probability 1 we have X C domgo an d by definition 
dim(domg , o) = d, the statement follows from Okamoto [1973]. □ 



Below we assume that our observations are in general position for any 
n. For an increasing model we also assume that all Xi belong to This 
assumption holds with probability 1 since /i \ W\_ = 0. 

If a MLE for the model V(h) exists, then it maximizes the functional: 



L„5 = /(log/t) °gdF ri 



over g E G(h), where the last integral is over M + for increasing h and over 
M d for decreasing models. The theorem below determines the form of the 
MLE for an increasing model. We write ev x / = (f(xi),...,f(x n )), x = 
(xi, . . . , x n ) with Xi € 

Lemma 3.12. Consider an increasing transformation h. For any convex 
function g with doing = M + such that: f^d h o gdx < 1 and L n <? > — oo, 

there exists g G Q{h) such that g > g and L, n g > L n p. The function g can 
be chosen as a minimal element in ev^ 1 p where p = ev x g. 

Proof. Let p = ev x g. Since L n c/ > — oo we have g(Xi) > yo for all 
1 < i < n and therefore g > yo for x G conv(X). Consider any minimal 
element gi among convex functions in ev^ 1 ^ (which exists by Lemma 4.15). 
Then: 

/ h o g\dx < , ho gdx < 1. 
Jr + ' Jm + 

Since gi is polyhedral we have gi = max for some linear functions Zj (x) = 



on it. 



a\ x + b and for each function Zj there exists some facet of g\ such that g\ = Zj 
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By Lemma 4.15 the interior of the facet of g\ which corresponds to U 
contains some Xj i £ X. We have dg\(Xj i ) = {oj} and gi(XjJ = g{Xj t ) > 
yo- Thus by Lemma 3.1, all coordinates of aj are negative and the supremum 
M of gi is attained at 0. Therefore bi = h(0) < M. By Lemma 3.3 we have 
M < Uoq. Thus by Lemma 3.4 the functions li o (I, + c) are integrable for 
all c < t/oo — M. Since g\ has only finite number of facets we have that 
h ° (gi + c) is also integrable for all c < i/oo — M. Finally, for c = j/oo — M 
function h o (gi + c) is not integrable by Lemma 3.3. 

The function T(c) defined as: 



is increasing, finite for c £ [0, Hoc — M) and continuous for c E [0, yoo — M] by 
monotone and dominated convergence. Since T(0) < 1 and T(h 0O — M) = 
+cxd, there exists ci G (0,yoo — M) such that T(ci) = 1. Then the function 



Theorem 3.13. // an MLE go exists for the increasing model V{K), 
then there exists an MLE gi which is a minimal element in ev^ p where 
q = ev x go. In other words g\ is a polyhedral convex function such that 
dom gi = M , , and the interior of each facet contains at least one element of 
X. If h is strictly increasing on [yo,yoo], then go(x) = gi(x) for all x such 
that go(x) > yo an d thus defines the same density from V(h). 

Proof. Let go be any MLE. Then by Lemma 3.5 applied to K = conv(X) 
it follows that L n go > — oo. By Lemma 3.12 there exists a function g\ £ V{K) 
such that gi is a minimal element in ev^- 1 q\ where q\ = ev^ g\ and g~\ >go- 
Since go is a MLE we have ev^ = ev x 3l which together with Lemma 4.15 
proves the first part of the statement. 

By Lemma 3.3 we have go < yoo and g\ < y^. Since ho go and hogi are 
continuous functions, for the strictly increasing h the equality: 



Here are the corresponding results for decreasing transformations h. 

Lemma 3.14. Consider a decreasing transformation h. For any convex 
function g such that: 




g = gi + ci satisfies the conditions of our lemma. 



□ 




implies that g±(x) = go(x) for x such that go(x) > yo- 



□ 




h o gdx < 1 
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and L n g > — oo there exists g G Q{h) such that g < g and h n g > h n g. The 
function g can be chosen as the maximal element in ev^ 1 q where q = ev x g. 

Proof. Let q = ev x g. Since h n g > we have g{Xi) < yo for all 
1 < i < n and therefore g < yo for x G conv(X). Consider the maximal 
element g± among convex functions in ev^ 1 q (which exists and is unique by 
Lemma 4.14). Then: 



/ ho g\dx < I, ho gdx = 1. 



By Lemma 3.6 there exists xq and m > — oo such that g > g{xo) = m. By 
Lemma 3.8 we have m > yoo. By Lemma 4.14 we have domgi = conv(X) 
and therefore: 



h o (jji + c)dx < h(m) fJ,[conv(X)] < oo. 
The function T(c) defined as: 

r ( c ) = L d h o (g x + c)dx 



is increasing, finite for c G (yoo — m, 0] and continuous for c G [yoo — m, 0] 
by monotone and dominated convergence. Since T(0) < 1 and T{y (Xl — m) = 
+oo, there exists c\ G (yoo — m,0) such that T(c\) = 1. Then the function 
g = gi + ci satisfies the conditions of our lemma. □ 

Theorem 3.15. If the MLE go exists for the decreasing model V{K), 
then there exists another MLE g\ which is the maximal element in ev^ 1 q 
where q = ev x go. In other words g\ is a polyhedral convex function with 
the set of knots K n C X and domain domyi = conv(X). // h is strictly 
decreasing on [2/00,2/0]; then go(x) = gi(x). 

Proof. Let go be any MLE. Then by Lemma 3.10 applied to K = 
conv(X) we have that L n yo > 0. By Lemma 3.14 there exists a function 
gi G Q h such that g\ is the maximal element in ev^ 1 qi where q\ = ev^ g± 
an d gi < go- Since go is a MLE we have ev x 9o = ev x5i' w hich together 
with Lemma 4.14 proves the first part of the statement. 

By Lemma 3.8 we have go > gi > yoo - Since hog and hogi are continuous 
functions, for the strictly increasing h, the equality: 

(h o gi — h o go)dx = 
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implies that gi(x) = go(x) for x £ conv(X). Therefore go (x) > yoo for 
x ^ conv(X). Since go is convex we have go = di- □ 

The bounds provided by the following key lemma are the remaining 
preparatory work for proving existence of the MLE in the case of increasing 
transformations. 

For an increasing model V(h) let us denote by M(h, X, e) for e > — oo the 
family of all convex functions g £ Q(h) such that g is a minimal element in 
ev X 1 where q = ev x g and h n g > e. By Lemma 3.5, the family M(h, X, e) 
is not empty for e > —oo small enough. By construction for g £ M(h, X, e) 
we have g{Xi) > yo for Xi G X. 

Lemma 3.16. There exist constants c(x, X, e) and C(x, X, e) < yoo which 
depend only on x £ R + , the observations X, and e, such that for any 
g £ M{h, X, e) we have: 

c(x,X,e) < g(x) < C(x,X,e). 

Proof. By Lemma 3.1 we have: 



hog(Xi)< 



which gives the upper bounds C(Xi,X,e). By assumption we have: 

(max h o giXi))^ 1 min h o g(Xi) >Y[hog(X l ) > e n£ , 
and therefore: 

min h o g{Xi) > 



h{maxC(Xi,X, e))"- 1 ' 

which gives the uniform lower bound c(Xi,X,e) for all Xi £ X. Since by 
Lemma 3.1 g(0) > g{Xi) we also obtain c(0,X,e). 

Now, we prove that there exist C(0, X, e). Let I be a linear function which 
defines any facet of g for which is an element. By Lemma 4.15, there exists 
X a £ X which belongs to this facet. Then g(0) = 1(0) and g(X a ) = l(X a ). 

Let us denote by S the simplex {I = l(X a )} n K + , by S* the simplex {Z > 

l(X a )}nM^ and by /' the linear function which is equal to c = min c(Xi, X, e) 
on S and to g(0) at 0. By the Cauchy-Schwartz inequality (as in the proof 
of Lemma 3.1) we have: 

jdl y I 
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We have I > I': 

1 = f, ho gdx > ho I'dx. 
Jm + ' Js* 

By Lemma 3.2 

d d \X a \ [v~ 
Consider the function T(s) defined as: 

If Voo = +oo, then for a fixed y G (c, +oo) we have: 

h'{y)l{y < s} 1 h' (y) as s ^ yoo 

and by monotone convergence we have: 

f v °° i 

T(s) T / h (y)dy = +oo as s -> y^. 

J c 

If Voo < +oo then for a fixed y G (c, yoo] we have: 

fc'(»)i{y < s} ( s -^) d t ^(y) (y^iY as s _> yoo 

\s-cj \yoo-cj 
and by monotone convergence we have: 

TO) T / h'(y)(— -) dy = +oo as s ► 2/ooj 



by assumption 1.2. 

Thus there exists so £ (c, yoo) such that T(so) > 1. This implies y(0) < So- 
Since so depends only on X a and minc(Xj, X, e) this gives an upper bound 
C(0,X,e). 

By Lemma 3.1 for any xq G M + we can set C(xo,X,e) = C(0,X,e). Let 
Z(a?) = a T x + Z(0) be a linear function which defines the facet of g to which 
x belongs. By Lemma 4.15 there exists X a G X which belongs to this facet, 



imsart-aos ver. 2009/08/13 file: ConvexTransfv3d.tex date: November 21, 2009 



28 



ARSENI SEREGIN AND JON A. WELLNER 



and thus l(X a ) = g(X a ). By Lemma 3.1 we have < for all k, and by 
definition 1(0) < g(0). We have 

c(X a ,X,s) < g(X a ) = l(X a ) = a T X a + 1(0) < a T X a + g(0), 

and therefore 

^ c(X a ,X,e)-C(0,X,e) 
ak nT\ 

l(0)>c(X a ,X,e). 

Now, 

f ^ U \^ c(X^^)-Ci^X^) 
g(x ) = l(x ) > — — (x ) k + c(X a ,X,e). 

(^a)k 

Since W6 have only finite number of possible choices for X ai we obtained 
c(xo, X, e) which concludes the proof. □ 

Now we are ready for the proof of Theorem 2.13. 

Proof. (Theorem 2.13) By Lemma 3.5 there exists e small enough such 
that the family M(h, X, e) is not empty. Clearly, we can restrict MLE can- 
didates g to functions in the family M(h, X, e). The set N = ev x M(h, X, e) 
is bounded by Lemma 3.16. Let us denote by q* a point in the closure N of 
iV which maximizes the continuous function: 

1 n 

L n(p) = -Vlog/t(gi). 
n 

i=i 

Since q £ N, there exists a sequence of functions g^ E M(h, X, e) such that 
ev x gk converges to q* . By Theorem 10.9 Rockafellar [1970] and Lemma 3.16 
there exists a finite convex function g* on such that some subsequence 
gi pointwise converges to g* . Therefore we have ev^ g* = q* . Since X C M%_ 
we can assume that g* is closed. By Fatou's lemma we have: 



/ h o g*dx < 1. 
7k, 



By Lemma 3.12 there exists g E Q(h) such that g > g* and h n g > L, n g* = 
L n (q*). By assumption this implies h n g = L n g* . Hence g is the MLE. Fi- 
nally, we have to add the almost surely clause since we assumed that the 
points Xi belong to El. □ 
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Before proving existence of the MLE for a decreasing transformation fam- 
ily, we need two lemmas. 

Lemma 3.17. Consider a decreasing model V(h). Let {g k } be a sequence 
of convex functions from Q{h), and let {rik} be a nondecreasing sequence of 
positive integers n k > such that for some e > —oo and p > the following 
is true: 

2. if n[lev ak g k ] = p for some a k , then P n Jlev afc g k ] < d/n d . 
Then there exists m > y^ such that g k > m for all k. 

Proof. Suppose, on the contrary, that m k — > yoo where m k = mmg k . 
The first condition implies that Xj = {X\, . . . , X nd } G dom h k , and therefore 
by Corollary 4.4 the function p\}ev y g k ] as a function of y admits all values 
in the interval [//[lev g k ] , /i[conv( X^)]] . If the second condition is true for 
some p then it is also true for all p 1 G (0,p), and therefore we can assume 
that p < /x[conv(Xd)]. 

By Lemma 3.6 we have /t[lev m g k ] — ► 0, and thus there exists such a k that 
/t[lev a g k ] = p for all k large enough. We define A k = lev flfc g k . By Lemma 3.6 
we have: h(a k ) < 1/p and therefore the sequence {a k } is bounded below by 
some a > y^. 

Consider t k > rn k such that t k — > yoo- We will specify the exact form of 
t k later in the proof. Since a k are bounded away from y^, it follows that for 
k large enough we will have t k < a k . Using Lemma 4.3 we obtain: 

p = p[A k ) < p[lev tk g k ) 

which implies: 

a k >m k + (t k - m k )[ph(t k )] 1/d . 

We have: 

9k > m k l{A k } + ojfc(l - l{A k }), 

and hence: 

^n k 9k < P nk {A k ) log h(m k ) + (1 - P„ fe (A k )) log h(a k ) 

< ¥ Tlk (A k ) log h(m k ) + (1 - F nk (A k )) log h(m k + (t k - m k )[ph(t k )} 1 / d ). 
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Case yoo = — oo. Choose t k = (1 — 5)m k where 5 E (0, 1). Then starting from 
some k we have m k < t k , h(m k ) > 1, h{—Cm k ) < 1 and Slph^tk)] 1 ^ > C+l. 
This implies: 

rn k + (t k - m k )[ph{t k )] l ' d = m fc (l - 5[ph{t k )] 1 / d ) > -Cm k , 
and hence: 

^n k 9k < F nk {A k ) log h(m k ) + (1 -F nk (A k )) log h(-Cm k ) 

< — log h(m k ) + — log h(-Cm k ) = — log [h(m k )h(-Cm k y] -> -oo. 

nd n d n d 

Case Uoo > —oo. Without loss of generality we can assume that yoo = 0. 
Choose t k = (1 + 5)m k where 5 > 0. Then: 

fi-d 

m k + (t k -m k )[ P h(t k )} l l d >m k 5[ph((l + 5)m k )} 1 > d ~m k d -»■ +oo 
which implies 

h(m k + (t k -m k )[ph(t k )] 1 / d ) = o(m k d 
This in turn yields 

/ /3d | a(P-d)(u d -d) \ 

exp(L nk g k ) = oim k nd dnd j = o(l). 

Therefore in both cases we obtained "L nk g k — * —oo. This contradiction 
concludes the proof. □ 

For a decreasing model V(h), let us denote by M(h, X, e) for e > —oo the 
family of all convex functions g E ^(^) such that g is a maximal element in 
evjjf 1 9 w here g = ev^ g, and L n g > e. By Lemma 3.10 family M(h, X, e) is 
not empty for e > — oo small enough. By construction for g G X, e) we 
have < y for G X. 

Lemma 3.18. For given observations X = (Xi, . . . ,X n ) such that n > 
rid, there exist constants m > j/qq and M which depend only on observations 
X and e such that for any g G A/"(/t, X, e) we have m < g(x) < M on 
conv(X) . 

Proof. Since by assumption the points X are in general position, there 
exists p > such that for any c?-dimensional simplex S with vertices from 
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X we have p[S] > p. Then any convex set C C conv(X) such that fi[0\ = p 
cannot contain more than d points from X. Therefore, we have P n [C] < 
d/n < d/rid- 

An arbitrary sequence of functions {gt} from J\T(h, X, e) satisfies the con- 
ditions of Lemma 3.17 with n k = n, the same e and p constructed above. 
Therefore the sequence {<?&} is bounded below by some constant greater than 
Uoq. Thus the family of functions J\f(h, X, e) is uniformly bounded below by 
some m > Hoc- 

Consider any g £ ftf(h,X, e). Let M g be the supremum of g on dom/i. 
By Theorem 32.2 Rockafellar [1970], the supremum is obtained at some 
Xm £ X and therefore M g < i/q. Let m g be the minimum of g on X. We 
have: 

h{mg) n ~ l h(Mg) > e n£ 

ki ~ Mg) ~ himg)"- 1 - h(m) n -i- 
Thus we obtained an upper bound M which depends only on m and X. □ 

Now we are ready for the proof of Theorem 2.14. 

Proof. (Theorem 2.14) By Lemma 3.10 there exists e small enough such 
that the family M(h, X, e) is not empty. Clearly, we can restrict MLE candi- 
dates to the functions in the family M(h, X, e) . The set iV = ev^ M(h, X, e) 
is bounded by Lemma 3.18. Let us denote by q* the point in the closure N 
of N which maximizes the continuous function: 

1 n 

L n(q) = -V]log/i(%). 

Since q G N, there exists a sequence of functions gt € Af(h, X, e) such 
that ev x gk converges to q*. By Lemma 3.18 the functions fk = supi >k gk 
are finite convex functions on conv(X), the sequence {fk(%)} is monotone 
decreasing for each x £ conv(X) and bounded below. Therefore fk [ g* for 
some convex function g* , and by construction ev x g* = q* . We have: 

/_ rf h o f k dx < I. ho g k dx = 1 
m + Jr + 

and thus by Fatou's lemma: 

/ h o g*dx < 1. 
Jul 
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By Lemma 3.14 there exists g G Q(h) such that g < g* and h n g > L n g* = 
L n {q*). By assumption this implies L n g = ~h n g* . Thus the function g is the 
MLE. 

Finally, we have to add the almost surely clause since we assumed that 
the points Xi are in general position. □ 

3.4. Proofs for Consistency Results. We begin with proofs for some tech- 
nical results which we will use in the consistency arguments for both in- 
creasing and decreasing models. The main argument for proving Hellinger 
consistency proceeds along the lines of the proof given in the case of d = 1 
by Pal, Woodroofe and Meyer [2007] . 

Lemma 3.19. Consider a monotone model V '(h) . Suppose the true den- 
sity ho go and the sequence of MLEs {g n } have the following properties: 

(/ilog/i) o g (x)dx < oo, 

and 

J log[e + h o g n (x)]d(F n (x) - P (x)) 0, 

for e > small enough. Then the sequence of the MLEs is Hellinger consis- 
tent: 

H(h o g n , h o g ) -> a . s . 0. 
PROOF. For s £ (0, 1) we have: 
> / log(e + h o go)dPo > log(e)P {h o 9o (x) <l-e}>-oo 

J {hogo(x)<l—e} 

0< / log(e + h o go )dP Q < f log(2/l o go)dPo 

J{hoqMx)>l\ J{hoqn(x)>l} 



< J (h log h) o go(x)dx + log 2 < oo. 



Thus the function log(e + h o go) is integrable with respect to probability 
measure Pq. 
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We can rearrange: 

< l^nQn ~ L n 5o 



\og[h o g n 



log[/lO 50 ]rfPr. 



(3.12) 
(3.13) 
(3.14) 



< J log[e + h o g n }d¥ n - J log[/t o g Q }dP r , 

< J\og[e + hog r , 
log 



Pn 



dPo 



e + hog n 
e + ho g _ 

+ /" log[e + ho g ]dP - J log[/i o g ]dF n . 

The term (3.12) converges almost surely to zero by assumption. 

For the term (3.13) we can apply the analogue of Lemma 1 from Pal et al. 

[2007]: 

e + hog n ~ 



II = lo 



dP < 2 



e + ho g 



b+hog 

For the term (3.14), the SLLN implies that: 
/// = J log[e + h o g )dP - J log[/i o go }dP n 

-*a.s. J log[e + h o 5f ]dP - y log[/i o g ]dP = J log 

Thus we have: 

< liminf(J + II + III) 

<a. s . -limsup2H 2 (ho g n ,ho g ) +2 / J — — \ dP 

J \l e + hog 

This yields 

limsup H 2 (h o g n , ho g ) < as J ^ ^ "~ 
as e I by monotone convergence. 



dP -2H (ho g n , h o g Q ). 



e + ho g 
ho g 



dPo- 



+ ho go/e 2 



log 



f log 


'e + h 


°go~ 


. ho g 


'e + ho g ' 


dPo 


I h 


°9o . 



dPo. 



□ 



Next lemma allows us to obtain pointwise consistency once we proved 
Hellinger consistency. 

Lemma 3.20. Consider that for a monotone model V{K) a sequence of 
the MLEs g n is Hellinger consistent. Then the sequence g n is pointwise con- 
sistent. In other words g n (x) — > a , s , go(x) for x G ri(dom go) and convergence 
is uniform on compacta. 
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Proof. Let us denote by L° a and L k a the following sublevel sets: 

L a = lev a fo 

Consider S7o such that Pr[Qo] = 1 an d H 2 (h o g%,ho g$) — > where g% 1S 
the MLE for w G Ho. For all w £ flo we have: 

/ [Vh og -Vho g n ] 2 dx > / [Vh og -Vho g n ] 2 dx 

> (Vh(a) - Vh(a + \ K +e ) - 0, 

and by Lemma 4.2 we have: 

liminfri(L°nL™ +e )=ri(L°). 

Therefore limsupg n (x) < a + e for x € ri(L°J. Since a and e are arbitrary 
we have limsup<? ra < go on ri(dom<7o)- 
On the other hand, we have: 

/ [y/ho g - \/ho g n ] 2 dx > / [Vh o g - y/h o g n ] 2 dx 

J JL2_ S \L0 

> (v ^( a _ £ ) _ ^(a))V(L™_ £ \ L°) - 0, 

and by Lemma 4.2 we have: 

limsupcl(L£_ £ UL°) = cl(L°). 

Therefore liminf g n (x) > a — e for x such that go(x) > a. Since a and e are 
arbitrary we have liminf g n > go on domgo- 

Thus g n — > go almost surely on ri(domgo)- By Theorem 10.8 Rockafellar 
[1970] convergence is uniform on compacta K C ri(Rl). □ 

We need a general property of the bracketing entropy numbers. 

Lemma 3.21. Let A be a class of sets in M. d such that class An [—a,a] d 
has finite bracketing entropy with respect to Lebesgue measure A for any a 
large enough: 

logN [] (e,An[-a,a] d ,L 1 (X)) < +oo 

for every e > 0. Then for any Lebesgue absolutely continuous probability 
measure P with bounded density we have that A is Glivenko-Cantelli class: 

Wn~P\\A ^a.s. 0. 
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Proof. Let C be an upper bound for the density of P and a be large so 
that for the set D = [— a, a\ d we have P([—a,a] d ) > l — e/2C. By assumption 
the class A D D has a finite set of e/2-brackets {[Li, Ui]}. Then for any set 
A G A there exists index i such that: 

Li c A n D C Ui 

Therefore: 

LiCACUiUD c 

and: 

\\l{U t U £> c } - l{Li}\\ Ll(P) < \\l{Ui} - l{^}|| Xl(P) + ||l{D c }|| il(P) 

< C(\\1{UA - l{Li}\\ LlW + \\l{D c }\\ Li(x) ) < e. 

Thus the set {[Li, Ui U D c ]} is the set of e-brackets for our class A in L\(P). 
This implies that A is a Glivenko-Cantelli class and the statement follows 
from Theorem 2.4.1 van der Vaart and Wellner [1996]. □ 

To prove consistency for increasing models we begin with a general prop- 
erty of lower layer sets (see Dudley [1999], Chapter 8.3). 

Lemma 3.22. Let CC be the class of closed lower layer sets in Ri and P 
be a Lebesgue absolutely continuous probability measure with bounded den- 
sity. Then: 

\\Wn-P\\cC ^a.s. 0. 

Proof. By Theorem 8.3.2 Dudley [1999] we have 

logJVQfo/XnlOjlft-LiCA)) < +oo. 

Since the class CC is invariant under rescaling, the result follows from 
Lemma 3.21 □ 

Note that Lemma 3.1 implies that if ho g belongs to an increasing model 
V(h) then (lev g) c is a lower layer set and has Lebesgue measure less or 
equal than l/h(y). Let us denote by As the set {\x\ < 6, x G M^}. Then by 
Lemma 3.1 part 3 we have: 

(3-15) (lev y s) c C A c/h{y) , 

iorc = d\/d d . 
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Proof. (Theorem 2.16) By Lemma 3.19 it is enough to show that: 
log[e + h o g n (x)]d{P n (x) - P (x)) 0. 



Indeed, applying Lemma 3.2 for the increasing transformation log[e + h(y)] 
log e we obtain: 



log[e + h o g n (x)]d(P n (x) - P (x)) 



r+oo 


r h'(z) i 


1 — oo 


e + h(z)_ 



M 



< Wn-Po\\cC 

< W^n ~ Policed 



e + h(z) 
£ + h{M) 



dz + 



+ 



r+oo 


r h'{z) i 


Im 


.£ + 



r+oo 


r i 


IM 


.e + /i(z). 



(P n -P )(lev 2 5 n ) c dz 
n - P |(lev 2 5„) c cte 
(P n + Po)(lev z 5 n ) c dz. 



The first converges to zero almost surely by Lemma 3.22. For the second 
term we will use the inclusion 3.15: 



r+oo 


r h'(z) i 


IM 





I n + P )(lev z 5„) c (iz < / 



/•+oo 


r i 


IM 





J n + Po)4 c /fc(# d*. 



Now, we can apply Lemma 3.2 again for = h 1 (c/|x|). We have 

(lev z gA) c = A-c/Mz) an d therefore: 



r+oo 


r &'(*) i 


IM 


.£+/l(z)_ 



'„ + PoJAs/hW <te = / log(e + c/|x|)d(P„ + P ) 



< 



log(2c/|x|)d(P n + P ), 



c/h(M) 



for M large enough. Assumption 1.5 and the SLLN imply that: 

f log(2c/|x|)d(P n + P Q ) 2 f log{2c/\x\)dP . 

•l-^c/h(M) J A c/h(M) 

Since M is arbitrary and -A c //j(m) \ {0} as M — » +oo the result follows. □ 

By Lemma 3.1 we have ri(Ri) C dom g$. Thus Theorem 2.16 and Lemma 3.20 
imply Theorem 2.17. 

Theorem 3.23. For an increasing model V{h) and a true density h o 
g which satisfies the assumptions 1. 4, 1.5 and 1.6 the sequence of MLEs 
g n is pointwise consistent. That is g n (x) — > a . s . 9o(x) for x S ri(Mi) and 
convergence is uniform on compacta. 



imsart-aos ver. 2009/08/13 file: ConvexTransfv3d.tex date: November 21, 2009 



MULTIVARIATE CONVEX DENSITIES 



37 



Finally, we prove consistency for decreasing models. We need a general 
property of convex sets. 

Lemma 3.24. Let A be the class of closed convex sets A in M d and P be 
a Lebesgue absolutely continuous probability measure with bounded density. 
Then: 

\\Pn-P\U ~^a.s. 0. 

Proof. Let D be a convex compact set. By Theorem 8.4.2 Dudley [1999] 
the class Af\ D has a finite set of e-brackets. Since the class A is invariant 
under rescaling, the result follows from Lemma 3.21 □ 

Lemma 3.25. For a decreasing model V{K) the sequence of MLEs g n is 
almost surely uniformly bounded below. 

Proof. We will apply Lemma 3.17 to the sequences g n and {n}. By the 
SLLN and Lemma 3.9 we have: 

L n <?n > L n#0 ~^a.s. J [hlogh] o g dx > -OO. 

Therefore the sequence {L n g n } is bounded away from — oo, and the first 
condition of Lemma 3.17 is true. 

Choose some a S (0,d/n^). Then for any set S such that p[S] = p = 
a/h(mingo) where mingo is attained by Lemma 3.6 we have: 

P\S] = ho g^dx < fi[S\h( min go) = a < d/n^. 
Js 

Now, let A n = lev a ^ g n be such sets that /uL4 n ] = p. Then, by Lemma 3.24 
we have: 

\F n [A n ] - P[A n }\ < ||P n - P\\ A -^ a . s . 0, 

which implies that P n L4 n ] < d/n^ almost surely for n large enough. There- 
fore, the second condition of Lemma 3.17 is true and it is applicable to the 
sequence g n almost surely. □ 

Proof. (Theorem 2.18) By Lemma 3.9 and Lemma 3.19 it is enough to 
show that: 

log[e + h o g n ( x )]d(F n (x) - P (x)) 0. 



By Lemma 3.25 we have inig n > A for some A > hoo. Therefore, by 
Lemma 3.7 applied to the decreasing transformation log[e + h(y)] — logs 
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it follows: 

(F n -P )(lev z g n )dz 

f+oo 

< \\^n-Po\\A 

J A 

= ||P„-P |Ulog 

where the last limit follows from Lemma 3.24. □ 

Proof. (Theorem 2.19) By Lemma 3.20 g n — * go almost surely on ri(dom go). 
Functions go and g^ differ only on the boundary d dom go which has Lebesgue 
measure zero by Lemma 4.1. Since observations JTj £ ri(domgo) almost 
surely we have g n = +oo on d dom go and thus g n — > g^. 

Now, we assume that dom go = M d . By Lemma 3.6 function go has 
bounded sublevel sets and therefore there exists xo where go attains its 
minimum m. Since ho go is density we have h(m) > and by Lemma 3.8 we 
have h{m) < oo. Fix e > such that h{m) > 3e and consider a such that 
h(a) < e. The set A = lev a go is bounded and by continuity go = a on d A. 
Choose 5 > such that h(a — 5) < 2e < h(m + 5) and: 

sup \h(x) — h(x — (5)| < e. 

xS[m,a+(S] 

compact and thus for n large enough we have with proba- 

sup|5 n - go | < 6, 

A 

Sup |/l o g n - h o O | < £, 
A 

since the range of values of go on A is [m, a] . The set is compact and 
therefore g n attains its minimum m n on this set at some point x n . By con- 
struction: 

m n = g n (x n ) > go(x n ) - 5 = a- 5 > m + 5 = go(xo) + 6 > g n {xo) = m 

We have xo G Anlev a _ 5 g n and g n > m n > a — 5 on dA. Thus, by convexity 
we have lev a _ (5 g n C A and for x ^ A we have: 

\h o g n (x) — ho go(x)\ < ho g n (x) + ho go(x) < h(a — 5) + h(a) < 3e. 



log[e + /» o ^(s)]d(P n (a?) - P (x)) 



-h'(z) 
e + h(z) 



-h'(z) 



e + h(z) 



dz 



e + h{A) 



The closure ^4 is 
bility one: 

which implies: 
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This shows that for any e > small enough we will have: 

\\h o g n - h o II oo < 3e 
with probability one as n — > oo. This concludes the proof. □ 

3.5. Proofs for Lower Bound Results. We will use the following lemma 
for computing the Hellinger distance between a function and its local defor- 
mation: 

Lemma 3.26. Let {g £ } be a local deformation of the function g : M. d — > R 
at the point xq, such that g is continuous at Xq, and let the function h : R — > 
R 6e continuously differentiable at the point g(xo). Then for any r > 0: 



(3.16) lim / \g £ (x) — g{x)\ T dx = 0, 

/ K d |/i o g e (x) -lio g(x)| r dx 

(3.17) lim — : — — — — | — = \h o o(xo)| r . 

J Rd | 5£ (x)- 5 (x)|Mx 1 yv U;| 

Proof. Since {<7 e } is a local deformation, for e > small enough we 
have: 



\ho g £ (x) - ho g(x)\ r dx = / \h o g £ (x) - h o g(x)\ r dx, 

JB(x ;r £ ) 

\9e{x) - g{x)\ r dx = / \g £ (x) - g{x)\ r dx. 

JB(x ;r c ) 



Then: 

IjJi.rrr.r. I 

implies (3.16). 

Let us define a sequence {a £ }: 



/ be - < ess sup b e - 0| r ju[.B(a;o; r e )] 



o e = esssupb e -5|+ sup \g(x) - g(x )\. 

xEB(xo;r s ) 

For x £ B(xo;r £ ) and y £ [g £ (x),g(x)] we have a.e.: 

\y ~ g{x )\ < \g £ {x) - g{x)\ + \g{x) - g(x )\ < a £ . 
Using the mean value theorem we obtain: 

\ho g £ (x) - ho g(x)\ r dx = / \h'{y x )\ r \g £ (x) - g{x)\ r dx 



fwd \h o gJx) — ho g(x)\ r dx 
inf \h'(y)\ r < M ' < sup \ti(y)\' 

y£B(g{x );a e ) J Rd \g £ {x) - g{x)\ r dx y£B(g(x );a £ ) 
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Since h! is continuous at g(xo), to prove (3.17) it is enough to show that 
a £ — > 0. By assumption we have: 

lim ess sup \g £ — g\ = 0. 

e^O 

Since g is continuous at xq and r £ — > we have: 

lim sup \g(x) - g(x )\ = 0. 

£_>U xeB(a:o;r £ ) 

Thus a £ — * 0, which proves (3.17). □ 

In order to apply Corollary 2.22 we need to construct deformations so that 
they still belong to the class Q. The following lemma provides a technique 
for constructing such deformations. 

Lemma 3.27. Let {g £ } be a local deformation of the function g : M. d — > R 
at the point xq, such that g is continuous at xq, and let the function h : M — > 
M be continuously differentiate at the point g(xo) so that h! o g(xo) ^ 0. 
Then for any fixed 5 > small enough, the deformation gg g = 9g$ + {\ — 6)g 
and any r > we have: 

(3.18) limsup6>- r / \h o gg^(x) — h o g(x)\ r dx < oo, 

(3.19) liminf6r r / \ho g g s (x) - ho g(x)\ r dx > 0. 

8^0 jRd 

Note that gg^ is not a local deformation. 

Proof. The statement follows from the argument for Lemma 3.26. For 
a fixed 6 the family {go t $} is a local deformation. Thus for ag t£ defined by: 

a e>£ = ess sup Is^e - g\ + sup \g(x) - g(x )\, 

x£B(xo;r £ ) 

it follows that 

lud \hog e ,e(x) - hog{ x )\ r dx .„. ... 
= — j ^ ^7-, < sup \h{y)\ , 

hd \geA X ) ~ 9{X)\ T dx y<EB(g(x o );a 0>e ) 

J R d\hog e}£ (x)-hog(x)\ r dx ... 

r — i r^n — ; ^ m i "(2/) • 

J R d Ifl'e.eW - ffWraz seB(9N);«9, £ ) 

For |#| < 1 we have: 

\ge,s - g\ = \0\\g £ - g\ 
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and therefore clq^ < a-5- Since a £ — > and h is continuously differentiable 
for all 5 > small enough we have: 

sup \h'(y)\ r < sup \h'(y)\ r < oo, 
y€B(g(xo);ag ! s) y£B(g(x );a 6 ) 

inf \h'(y)\ r > inf \h'(y)\ r > 0. 

»6-B(s(a:o);ae,*) y^B(g(x );a s ) 

Thus for all we obtain: 

1 7* ° 50,<5( x ) "too g{x)\ r dx 



< sup \h'(y)\ r \gs(x) - g(x)\ r dx < oo, 
yefl(fl(*o);oi) ^ 

|too fi f e (5 (x) - /iog(x)| r cfc 



> inf \h'(y)\ r \g 5 (x)-g(x)\ r dx>0 

y£B(g(x );a s ) jRd 

which proves the lemma. □ 

Proof. (Theorem 2.24) . Our statement is not trivial only if the curvature 
curv xo g > or equivalently there exists positive definite d x d matrix G such 
that the function g is locally G-strongly convex. Then by Lemma 4.17 this 
means that there exists a convex function q such that in some neighborhood 
O(xo) of xo we have: 

(3.20) g{x) = -{x - x ) T G{x - x ) + q{x). 

The plan of the proof is the following: we introduce families of functions 
{D £ (g; xq, v)} and {D*(g; xq)} and prove that these families are local de- 
formations. Using these deformations as building blocks we construct two 
types of deformations: {h o g+} and {h o g~} of the density hog which 
belong to Vih). These deformations represent positive and negative changes 
in the value of the function g at the point xq. After that we approximate 
the Hellinger distances using Lemma 3.26. Finally, applying Corollary 2.22 
we obtain lower bounds which depend on G. We finish the proof by taking 
the supremum of the obtained lower bounds over all G E SC(g; xq). Under 
the mild assumption of strong convexity of the function g both deforma- 
tions give the same rate and structure of the constant C(d). However, it is 
possible to obtain a larger constant C{d) for the negative deformation if we 
assume that g is twice differentiable. Note that by the definition of V(h) the 
function g is a closed proper convex function. 
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original 



deformation 





e 




Fig 1. Example of the deformation D £ (g;xo,vo). 



Let us define a function D £ (g; xq, vq) for a given e > 0, xq E domg and 
v £ dg(x ) as: 

D £ (g;x ,v )(x) = max(g(x),l (x) +e), 

where Zo(:c) = {vo,x — Xo) + g(xo) is a support plane to g at xo- Since Zq + £ 
is a support plane to g + e we have: 



and thus dom D e (g; xq, dq) = dom^f. As a maximum of two closed convex 
functions D e {g; xq, vq) is a closed convex function. For a given xi we have 
D e (g; x , v )(x 1 ) = g(x{) if and only if: 



We also define a function D*(g; xq) for a given e > and xo € dom^ as 
a maximal convex minorant (Appendix 4.1) of the function g E defined as: 



Both functions D £ (g; xq,Vq) and D*(g;xo) are convex by construction, and, 
as the next lemma shows, have similar properties. However, the argument 
for D*(g;xo) is more complicated. 

Lemma 3.28. Let g be a closed proper convex function, g* be its convex 
conjugate and xq G ri(domg'). Then: 



9 < D £ (g;x ,v ) < g + e, 



(3.21) 



g{xi) - e > (v ,xi - xq) + h(xo). 
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original deformation 




x x 



Fig 2. Example of the deformation D^(g;xo). 

1. D*(g;xo) is a closed proper convex function such that: 

g-e< D*(g;x ) < g 

and dom D*(g;xo) = domg. 

2. For a given x\ G ri(domg) we have D*(g; xo)(x\) = g{x\) if and only 
if there exists v G dg{x\) such that: 

(3.22) g(x l ) + e < {v,xi - x ) + g(x ). 

3. Ify G dg(x ) then x G dg*(y ) and: 

D £ (g;x ,yo) = (D*(g*;y )y. 

Proof. Obviously, g £ > g — e. Since g — e is a closed proper convex 
function it is equal to the supremum of all linear functions I such that 
I < h — e. Thus g — e < D*(g; xq), which implies that D*(g; xo) is a proper 
convex function and dom D*(g; xq) C dom(g — e) = dom g. By Lemma 4.10 
we have D*(g;xo) < g, therefore domg C dom D e {g\ xq) which proves 1. 

If v G dg(xi) then l v (x) = (v,x — x\) + g(x±) is a support plane to g(x) 
and l v < g. If inequality (3.22) is true then l v (x) is majorized by g £ and we 
have: 

D £ (g;x )(x 1 ) < g(xi) = l v {x\) < D £ (g; x )(x 1 ). 

On the other hand, by 1 we have x\ G ri(domD e (g i ; Xq)), hence there exists 
v G dD £ (g; xq)(xx) and: 

g{x) > g £ (x) > D £ (g;x )(x) > (v, x - xi) + D £ (g; x )(x 1 ) 

= {v,x - xi) +g{xi). 
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Therefore v £ dg{x\). In particular: 

g £ (x ) = g(x ) -e> D £ (g; x )(x ) 

> (v, x - x\) + D £ (g; x )(xi) = (v, x - x\) + g(xi), 

which proves 2. 

We can represent D*(g*; xq) as the maximal convex minorant of g defined 
by: 

g = mm(g,g(x ) - e + 5(-\x )). 

For x £ domg by Lemma 4.10, g*(yo) + g(%o) = {yo,xo). Thus 

(g(x ) -s + 5(-\ x ))*(y) = {x , y) - g{x ) + e = (x ,y - v) + e 

for some v G dg(xo). By Lemma 4.7 we have: 

D* £ (g*;x )* =max(g*,Zo), lo(y) = (x ,y-v) +e 

which concludes the proof the lemma. □ 

Since the domain of the quadratic part of the equation (3.20) is M. d , by 
Lemma 4.11 we have that for any x £ domg and v £ dg{x) there exists 
w £ dq(x) such that: 

(3.23) v = G(x - s ) + w. 

Therefore for the point x\ in the neighborhood O(xo) where the decompo- 
sition (3.20) is true, condition (3.21) is equivalent to: 

^(xi - xq) t G{x 1 - x ) + q(xi) -£> (w , x x - x ) + q(x ), 

where wo corresponds to yo in the decomposition (3.23). Since (wo,xi — 
xq) + q(xo) is a support plane to q(x), the inequality (3.21) is satisfied if 

-(xi - x ) T G(xi - Xq) > e, 

which is the complement of an open ellipsoid Bq(xq, \/2e) defined by G 
with the center at xq. For e small enough this ellipsoid will belong to the 
neighborhood 0(xq). Since \D e (g; xo, yo) — g\ < £ this proves that the family 
D £ (g; xq, yo) is a local deformation. 
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In the same way, the condition (3.22) is equivalent to: 
-(xi - x ) T G(xi - x ) + q(xi) + e < (G(xi - x ) + w lt xi - x ) + q(x ) 
or: 

-{x\ - x ) T G(xi - xq) + q(xo) - s > (wi,x - x\) + q(xi), 

which is satisfied if we have: 

-(xi - x ) T G(xi - x ) > e. 

Since \D*(g; xq) — g\ < e this proves that the family D*(g; xo) is also a local 
deformation. Thus we proved: 

Lemma 3.29. Let g be a closed proper convex function locally G-strongly 
convex at some x$ € ri dom g and yo G dg(xo) . Then the families D e (g; xo, yo) 
and D*(g;xo) are local deformations for all e > small enough. Moreover, 
the condition: 

-(x - x ) T G(x - x ) > e 

implies 

D £ (g;x ,y )(x) = D*(g;x )(x) = g(x). 

Or equivalently supp[D e (g; xq, yo) — g] and supp[D*(g; xq) — g] are subsets 
''//>'<;(•''„• \'2-). 

For r > small enough h' o g{x) is positive and the decomposition 3.20 
is true on B[xQ\r). Let us fix some yo £ dg(xo), some x\ G B(xo;r) such 
that x\ ^ xo, and some y\ G dg{x\). We fix 5 such that equation (3.18) 
of Lemma 3.27 is true for the transformation \fh and r = 2 and also Xo ^ 
Bq(x\; \/2S). Then by Lemma 3.29 for all e > small enough, the support 
sets supp[D e (g; xo, yo) — g] and supp[D*(g; xq) —g] do not intersect; i.e. these 
two deformations do not interfere. 

Now, we can prove Theorem 2.24. The argument below is identical for gf 
and g~ , so we will give the proof only for g+ . We define deformations gf 
and g~ by means of the following Lemma: 
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Lemma 3.30. For all e > small enough there exist Of , 0~ £ (0, 1) such 
that the functions gf and g~ defined by: 

g+ = (1 - 9+)D E {g; x ,v ) + OfD^g- Xl ) 
97 = (1 - e e) D l{9\ xq) + 0~D s (g; xr, vi) 

belong to V{h). 

Proof. By dominated convergence, the function F{9) defined by: 
F{6) = J ho {{I- 9)D £ {g- x , v )dx + 9D* s (g; x x )) dx 
is continuous. We have: 

F(0)=JhoD £ (g;x ,v )dx>Jho gd x = l, 
F(l) = JhoD* s (g; x x )dx < Jhogdx = l. 

Therefore there exists 6+ £ (0, 1) such that F{0+) = 1. □ 

Next, we will show that Of goes to zero fast enough so that gf is very 
close to D £ (g; xq,vq). Since supports do not intersect we have: 



= l(hog+-ho g)dx = / (h o ((1 - 9+)D £ (g; x ,v ) + 9+g) -hog)dx 

{hog -ho {{I- 0+)g + 9+D* s (g; Xl ))) dx, 



where both integrals have the same sign. For the first integral by Lemma 3.26 
we have: 



\h o ((1 - 9+)D £ (g;x ,v ) + 9+g) -hog\dx 
< [ \ho D £ (g;x ,v ) - ho g\dx 

(9 - D £ (g;x ,v )) dx < £fj,[B G (x ; \f2e)). 
The second integral is monotone in 0+, and by Lemma 3.27 we have: 
{hog -ho {{i- 9+)g + 9+D* 5 {g; xi))) dx x 0+ 
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Thus we have 0+ = 0(e l+d / 2 ) and: 

lim s-^+Cso) - (/(so)) = hm(l - 0+) = 1. 

e— >0 e— >0 

For Hellinger distance we have: 

H(h o h o 0) =#(/» o ((1 - 0+)D £ (g; x , vq) + 0+g), h o 5 ) 
+ o ((1 - e+)g + a*)), bj). 

Now, we can apply Lemma 3.26 

H 2 (h o ((1 - 0+)D £ (g; x , v ) + 0+g), hog)< H 2 {h o D e ( 5; x , v ),hog), 



lim 



H 2 (ho D £ (g;x ,v ),ho g) h' o g(x )' 



and 



/(-De(5;^o,w ) -g) 2 dx Ahog(x )' 

2 d / 2 ^[S(0,l)} 



J (D £ (g; xq, vq) - gfdx < e 2 fJ,[B G (x ; V2e)} = e 2+d ' 2 
This yields 



lim sup e * H(h o ((1 - 6+)D £ (g; x , v ) + 0+g), h o g) 



< C{d) 



ti o g(x ) 4 
h o g(xo) 2 det G 



1/4 



where 5(0, 1) is <i-dimensional sphere of radius 1. 
For the second part by Lemma 3.27 we obtain: 



and 



Thus: 



limsup(0+)- 2 # 2 (/ i o ((1 - 6+)g + 6+D* s (g; a*)), hog)<^, 



H{h o ((1 - 0+)g + 0+B* s (g; Xl )), hog)= O(e^). 



d+4 



lim sup e 4 H(ho gj ,ho g) < C(d) 
Finally, we apply Corollary 2.22: 



liminfnd+3i?i(n;r,{ 5 , 5n }) > C{d) 



ti o ff(x ) 4 
h o g(xo) 2 det G 

h o g(xo) 2 det G 
h> o g(x ) 4 



1/4 



1 

d+4 



Taking the supremum over all 67 6 SC(g; xq) we obtain the statement of the 
theorem. □ 
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3.6. Mode estimation. 

Proof. (Theorem 2.27). The proof is similar to the proof for a point 
estimation lower bounds. The deformation we will construct will resemble 
97- 

Our statement is not trivial only if the curvature curv^, g > or equiva- 
lently there exists such positive definite d x d matrix G so that the function 
g is locally G-strongly convex. For a > small enough h! o g{x) is positive 
and decomposition 3.20 is true on B(xo;a). Let us fix some yo £ dg(xo), 
some x\ E B(xo; a) such that x\ ^ xq and some y\ £ dg(x\). We fix S such 
that equation (3.18) of Lemma 3.27 is true for the transformation \fh and 

r = 2 and also xq ^ Bg{xi; V26). 

Let us consider the deformation D^ e ^(g;xo + eu) where u G M d is an 
arbitrary fixed vector in M. d with ||u|| = 1 and 

£(e) = g(x ) - g(x + eu) + e <+1 . 

Since the value of D^ £ ^(g; xq + eu) at any point x is a convex combination 
of g(y) for some y, g(x) > g(xo) and 

D l( £ ){a; xq + eu)(x + eu) = g(x ) + e 7+1 

the global minimum of D*^ £ ^(g; x$ + eu) is xq + eu. By Lemma 3.29 for all 
e > small enough we have 

supp[D| (e) (sr; xq + eu) -g]Q B G (x + eu, \j2^(e)). 
Since, by assumption 

£(e) < Le~< + e 7+1 

the support of supplD^^g; xq + eu) — g] converges to a point xq and thus 
does not intersect supp[D e (g; x±, y\) — g] for e small enough i.e. these two 
deformations do not interfere. 

The same argument as in Lemma 3.30 shows that there exists 6™ G (0, 1) 
such that the deformation g™ defined as: 

g™ = (1 - 9™)D* m (g; x + eu) + 0?D 5 {g; x u y x ) 

belongs to V{h). Also g™ > D^ £ ^(g; xq + eu) and the global minimum of g™ 
is xq + eu. We have: 
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Next, we will show that 0™ goes to zero fast enough so that g™ is very 
close to D^ e j(g; xq + en). We have: 







(ho g™ - ho g)dx 

(hog -ho ((I- 9™)D* m (g; x + en) + Ofg)) dx 
+ { (ho(0™D s (g;x 1 ,yi) + (l-e™)g)-hog)dx, 



where both integrals have the same sign. For the first integral by Lemma 3.26 
we have: 



j\h o g _ h a ((l _ 0™)D* m (g; x + eu) + 0™g) 



dx, 



dx 



J D*^ e) (g; x Q + eu) - g 



< J h o g - h o D*^ e) (g; x + eu 

< £(e)n[B G (x + eu; yfifc))] = 0(i(e) l+d ' 2 ) 

The second integral is monotone in 0™ and by Lemma 3.27 we have: 

J (h o (9™D s (g; x uyi ) + (1 - 0?)g) -hog)dx~ 0?, 



dx 



thus we have 



0(e 



T(l+d/2)^ 



For Hellinger distance we have: 

H(h o g™, hog)=H(ho ((1 - 0?)Dl (£) (g; x + eu) + 0™g), h o g) 
+ H(h o (9fD 5 (g; Xl , yi ) + (1 - 9™)g)),h o g). 

For the first part we can apply Lemma 3.26: 

H 2 (h o ((1 - 0?)Dt (£) (g; x + eu) + 0fg),h o g) < H 2 (h o Dt, £) (g; x + eu),ho g) 



H 2 (ho D* aE) (g;x +eu),ho g) ^ h' o g(x ) 2 
J(D* {£) (g; x + eu) - g) 2 dx ~ 4h o g(x ) 



lim 



and 



J (D* m (g; x + en) - gfdx < £(e) 2 u[B G (x ; ^(e))} 



C(e 



,2+d/2 



2 d / 2 u[S(0,l)} 
VdetG 
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which gives: 



limsupe-^+^tf (fc o ((1 - 9™)D* m (g; x + en) + 0fg),h o g) 

1/4 

< C{d)L 1+d ^ 



h' o g(x ) 



h o g(xo) 2 det G 



where 5(0, 1) is c?-dimensional sphere of radius 1. 
For the second part by Lemma 3.27 we obtain: 



limsup(0+)- 2 tf 2 (fc o ((1 - 9+)g + 9fDs(g; x uyi )),h o g) < oo 



(fc o ((1 - + 9+D s (g; x 1 ,y 1 )),h o 5 ) = 0(e^ 1+d / 2 )). 



Thus: 



lim sup e 



-i(i+d/i) H ( h bff) < c\d)L 1+d / A 
Finally, we apply Corollary 2.22: 

liminf n R^n ;T,{p,p n } ) > C(d)L~ 



h' o g(a:p) 4 
/i o ^(xq) 2 det G 



/i o g(x ) 2 detG 
h'og(x Y 



7 (d+4) 



Taking the supremum over all G £ SC{g; xq) we obtain the statement of the 
theorem. □ 

3.7. Indications of Proofs for Conjectured Rates. 

4. Appendix: some results from convex analysis. We will use the 
following general properties of convex sets and convex functions. We use 
Rockafellar [1970] as a reference. 

Lemma 4.1. For any convex set A in we have: 

1. The boundary of A has Lebesgue measure zero. 

2. A has Lebesgue measure zero if and only if it belongs to a d — 1 dimen- 
sional affine subspace. 

3. A has Lebesgue measure +oo if and only if it is unbounded and has 
dimension d. 



Proof. 
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1. If A is such that cl(A) has finite Lebesgue measure then: 

dA<Z(l + e) cl(A) \ (1 - e) cl(A), e G (0, 1) 
n[dA] < 2e/z[cl(A)] 

and thus /x[5.A] = 0. Since M. is a countable union of closed convex 
cubes Bi the result for an arbitrary convex set A follows from: 

OA C {Jd(AnBi). 

i 

2. If A has dimension k < d then its affine hull V has dimension k and A 
contains a £;-dimensional simplex D (Theorem 2.4 Rockafellar [1970]). 
Then if k = d we have n\D\ > and if k < d we have fi[V] = 0. 

3. Part 1 implies that it is enough to consider closed convex sets. Part 2 
implies that it is enough to prove that an unbounded closed convex 
set of dimension d has Lebesgue measure +oo. Let A be such a set; i.e. 
an unbounded closed convex set. Then A contains d- dimensional sim- 
plex D (Theorem 2.4 Rockafellar [1970]) which has non-zero Lebesgue 
measure. Since A is unbounded then its recession cone is non-empty 
(Theorem 8.4 Rockafellar [1970] ) and therefore we can choose a direc- 
tion v such that D + Xv C A for all A > which implies fi[A] = +oo. 

□ 

The following lemma shows that convergence of convex sets in measure 
implies pointwise convergence. 

Lemma 4.2. Let A be a convex set in M rf such that dim(A) = d and 
/ 0. Then: 

1. Suppose a sequence of convex sets B n is such that A C B n and lim [i[B n \ 
A] = then limsupcl(-B n ) = cl(A); 

2. Suppose a sequence of convex sets B n is such that C n C A and lim /j,[A\ 
C n ] = then liminf ri(C n ) = ri(A). 

Proof. By Lemma 4.1 we can assume that A, B n and C n are closed 
convex sets. 

1. If on the contrary, there exists subsequence {k} such that for some 
x £ A° we have x £ P\k>iBk then for xA = conv({x} U A) we have: 

xA C B k 

n[B k \A] >fi[xA\A]. 
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Since A is closed there exists a ball B{x) such that B{x) D A = 0. 
Since ri(^4) 7^ there exists a ball B(xq) such that -B(xo) Q A for 
some xo £ ri(^4). Then for xi? = conv({x} U B(xq)) we have: 

xB C 

mM \ ^] > n > 0. 

This contradiction implies lim sup Bi = A. 
2. If on the contrary, there exists a point x £ ri(j4) and subsequence {/c} 
such that x ^ Ck for all A; then for each C/% there exists a half-space 
such that x £ Lk and ft C L^. Let be a ball such that B{x) C A. 
We have: 

/iL4 \ d] > n[A n L fe ] > fi[B(x) n L fc ] = n[B(x))/2 > 0. 
This contradiction implies ri(^4) C liminf Cj. 

□ 

Our next lemma shows that the Lebesgue measure of sublevel sets of a 
convex function grows at most polynomially. 

Lemma 4.3. Let g be a convex function and values y\ < y.2 < y.3 are 
such that lev yi g 7^ 0. Then we have: 

(4-24) Mpev OT <7] < 

Proof. By assumption we have: 

/41ev y3 g] > M[lev^ g] > ^[lev yi g] > 0. 
Let us consider the set L defined as: 

L = {xi + k(x -xi)\x£ lev y2 g}, 
where x\ is any fixed point such that g(x\) = yi and 

V2 - yi 

Then: 

/x[L] = k d fj,[lev y2 g). 
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and therefore it is enough to prove that lev g C L. 
If X3 6 lev g then for 3:2 = x\ + (X3 — x\)/k we have: 



£3 = £1 + ^(^2 - 

<?(z 2 ) < (1 - l/%(xi) + (l/k)g(x 3 ) = y 2 



and thus x 2 G lev (7. 



□ 



Corollary 4.4. If g is a convex function then function fi[lev y g] is 
continuous on (inf g, sup g) . 

4.1. Maximal convex minorant. In this section we describe the convex 
function / c which is in some sense the closest to a given function /. 

Definition 4.5. The maximal convex minorant f c of a proper function 
f is a supremum of all linear functions I such that I < f. 

It is possible that / does not majorate any linear function and then / = 
— 00. However if it is not the case the following properties of the maximal 
convex minorant hold: 

Lemma 4.6. Let f be a function and f 7^ —00 be its maximal convex 
minorant. Then: 

1. f c is a closed proper convex function; 

2. if f is proper convex function then f c is its closure; 



The maximal convex minorant allows us to see an important duality be- 
tween operations of pointwise minimum and pointwise maximum. 

Lemma 4.7. Let fi be a proper convex functions and let g = inf, fi be 
the pointwise infinum of fi. Then (g c )* = supj /*. 



3. 
I 



h < f; 

(/c)*(y) = sup a . gMd ((y,x) - f{x)). 



Proof. This follows from Corollary 12.1.1 Rockafellar [1970]. 



□ 



Proof. This follows from Theorem 16.5 Rockafellar [1970]. 



□ 
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4.2. Subdifferential. 

Definition 4.8. The subdifferential dh{x) of a convex function h at the 
point x is the set of all vectors v which satisfy the inequality 



Obviously dh(x) is a closed convex set. It might be empty, but if it is not, 
the function h is called subdifferentiable at x. 

Lemma 4.9. Let h be a proper convex function then for x £ ridom/i 
subdifferential dh(x) is not empty. 



Lemma 4.10. Let h be a closed proper convex function and x any point. 
Then the following conditions on x* are equivalent: 

1. y G dh(x); 

2. l(z) = (x*, z) — h*(x*) is a support plane for epi(/i) at x; 

3. h{x) + h*{x*) = (x*,x); 

4. x e dh*(y); 

5. l{z) = (x, z) — h{x) is a support plane for epi(/i*) at x* ; 

PROOF. This follows from Theorem 23.5 Rockafellar [1970]. □ 

Lemma 4.11. Let hi and hi be proper convex functions such that ri dom h\C\ 
ri dom hi ^ 0. Then d{h\ + hi) = dh\ + dhi for all x. 

Proof. This follows from Theorem 23.8 Rockafellar [1970]. □ 

4.3. Polyhedral functions. 

Definition 4.12. A polyhedral convex set is a set which can be expressed 
as an intersection of finitely many half- spaces. A polyhedral convex function 
is a convex function whose epigraph is polyhedral. 

From Theorem 19.1 Rockafellar [1970] we have that the epigraph of the 
polyhedral function h : M. d — > K has finite number of extremal points and 
faces. We call projections of extremal points the knots of h and projections 
of the nonvertical ci-dimensional faces the facets of h. Thus the set of knots 
and the set of facets of polyhedral function are always finite. Moreover, by 



h(z) > (v, z — x) + h(x) 



for all x. 



Proof. This follows from Theorem 23.4 Rockafellar [1970]. 



□ 
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Theorem 18.3 Rockafellar [1970] the knots are the extremal points of the 
facets. Finally, let {C{\ be the set of facets of a polyhedral function h then: 

dom h = [J Ci 

i 

ri(C i )nri(C i ) = J 

and on dom h we have h = max(Zj) where Zj are linear functions. For each 
Ci there exists 1% such that: 

Ci = {x \ h(x) = k(x)}. 

Lemma 4.13. Let f be a polyhedral convex function and x £ dom/i then 
dh(x) 0. 

Proof. This follows from Theorem 23.10 Rockafellar [1970]. □ 

Lemma 4.14. For the set of points x = {xj}" =1 such that x% G R rf and 
any point p G IR n consider a family of all convex functions h with ev x h = 
p. The unique maximal element U£ in this family is a polyhedral convex 
function with domain domU^ = conv(x) and the set of knots K C x. 

Proof. Points (xi,pi) and direction (0, 1) belong to the epigraph of any 
convex function h in our family and so does convex hull U of these points and 
direction. By construction U is an epigraph of some closed proper convex 
function U£ such that domU^ = conv(x), by Theorem 19.1 Rockafellar 
[1970] this function is polyhedral, by Corollary 18.3.1 Rockafellar [1970] the 
set of its knots K belongs to x and since epi(Z7J?) = U C epi(/i) we have 
h < U£. On the other hand, since (xi,pi) £ U we have 

Pi = h{xi) < Ulixi) < pi 

and therefore U^.{xi) = pi which proves the lemma. □ 

Lemma 4.15. For the set of points x = {xi}f =1 , convex set C such that 
Xi £ ri(C) and any point p G M n consider a family of all convex functions h 
with ev x h = p and C C dom/i. Any minimal element LP X in this family is a 
polyhedral convex function with domLg = W*. For each facet C of LP X , ri(C) 
contains at least one element of x. 

Proof. For any function h in our family let us consider the set of linear 
functions Zj such that li(xi) = h{xi) = pi and Zj < h and which correspond 
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to arbitrarily chosen nonvertical support planes for epi(/i) at x%. Then L = 
max(Zj) is polyhedral and since lj(xi) < h{xi) = pi we have L{xi) = pi. We 
also have domL = W 1 . If the interior of any facet C« of L does not contain 
elements of x we can exclude corresponding linear function li from maximum. 
For the new polyhedral function L' = max^j lj we still have ev x L' = p. 
Now, we repeat this procedure until interior of each facet contains at least 
one element of x and denote the function we obtained by LP X . If a closed 
proper convex function h is such that ev^ h = p and h < L§, then consider 
for any facet C{ and corresponding linear function li we have h < li on d 
and the supremum of h on a the convex set G% is obtained in interior point 
xj € x. By Theorem 32.1 Rockafellar [1970] h = LP on Cj. Thus h = L? and 
IP X is the minimal element of our family. □ 

Lemma 4.16. For linear function l(x) = a T x + b the polyhedral set A = 
{I > c} n M. d is bounded if and only if all coordinates of a are negative. In 
this case, if b > c the set A is simplex with vertices pi = ((c — b)/ai)ei and 
0, where are basis vectors. Otherwise, A is empty. 

Proof. If coordinate a» is nonnegative then the direction {Ae^}, A > 
belongs to the recession cone of A and thus it is unbounded. If all coordinates 
Oj are negative and b < c the set A is either empty or consists of zero 
vector 0. Finally, if are negative and b > c then for x € A we can define 
6i = aiXijic — b) > 0. Then 1 > J2i @i an d x = J2i ®iVi-> which proves that A 
is simplex. □ 

4.4. Strong convexity. Following Rockafellar and Wets Rockafellar and 
Wets [1998] page 565 we say that a proper convex function h : M. d — > R is 
strongly convex if there exists a constant a such that: 

(4.25) h{0x + (1 - 9)y) < 0h{x) + (1 - 6)h(y) - ^a0(l - 6)\\x - y\\ 2 

for all x, y and 9 £ (0,1). There is a simple characterization of strong 
convexity: 

Lemma 4.17. A proper convex function f : M. d — > M. is strongly convex 
if and only if the function f(x) — ^<t||x|| 2 is convex. 

Since we need a more precise control over the curvature of a convex func- 
tion we define a generalization of strong convexity based on the characteri- 
zation above: 
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Definition 4.18. We say that a proper convex function h : M d — * R 
is G -strongly convex if there exist a point xq, a positive semidefinite d x d 
matrix G and a convex function q such that: 

(4.26) h(x) = -(x — xq) T G(x — xq) + q(x) for all x. 

Obviously, strong convexity is equivalent to a /-strong convexity. Note 
that the definition does not depend on the choice of xq. 

Definition 4.19. We say that a proper convex function h : M rf — ► K is 
locally G-strongly convex at a point xq if there exist an open neighborhood 
of xq, a positive semidefinite d x d matrix G and a convex function q such 
that (4.26) holds for any x in this neighborhood. 

We can relate G-strong convexity to the Hessian of a smooth convex 
function: 

Lemma 4.20. If a proper convex function h : M. d — > R is continuously 
twice differentiate at xq then h is locally (1 — e)V 2 h- strongly convex for any 
e€ (0,1). 

The last result suggests the following definition: 

Definition 4.21. For a proper convex function h : W 1 — > R we define a 
curvature curv^ h at a point xq as: 

(4.27) cuiv x h = sup det(G) 

GeSC(h;x ) 

where SC(h; xo) is the set of all positive semidefinite matrices G such that 
h is locally G -strong convex at xq. 

Lemma 4.20 implies that: 

Lemma 4.22. // a proper convex function h : M. d — > M. is continuously 
twice differentiate at xo and Hessian V 2 /i(xo) is positive definite then 

(4.28) curv x /i = det(V 2 /i(j;o))- 
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APPENDIX A: NOTATION 



R = (-00, +00) 

1 = [_oo,+oo] 

R+ = [0, +00) 

1 + = [0, +00] 

M = Ilfc=i asfc, x€R d + 

C = {/ : R d — > K I / closed proper convex function} 

V = {p : R d -> 1 1 p density} 

W = {/i : | g £ C, ho g £ V} 

h n g = F n h o # 

ev,/ = (/(ari),.../(x n )), 

supp(/) = {x|/(x)/0} 

<5(-|C) = oo-l cc +0-l c 

lev y g = {x I fli(x) < y} 

/[/[S 1 ] = Lebesgue measure of S 

{f = a} = {xeX\f(x)^a} 

B(xo;r) = {x : \\x — xq\\ < r} 

Bh(xo;v) = {x : (x - xo) T H(x - xq) < r 2 } 

curv x h = curvature of a convex function h at a point x 
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